Mrkumar007 commited on
Commit
a49c996
·
verified ·
1 Parent(s): 4e8be23

Upload folder using huggingface_hub

Browse files
Dockerfile ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ # Multi-stage build using openenv-base
8
+ # This Dockerfile is flexible and works for both:
9
+ # - In-repo environments (with local OpenEnv sources)
10
+ # - Standalone environments (with openenv from PyPI/Git)
11
+ # The build script (openenv build) handles context detection and sets appropriate build args.
12
+
13
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
14
+ FROM ${BASE_IMAGE} AS builder
15
+
16
+ WORKDIR /app
17
+
18
+ # Ensure git is available (required for installing dependencies from VCS)
19
+ RUN apt-get update && \
20
+ apt-get install -y --no-install-recommends git && \
21
+ rm -rf /var/lib/apt/lists/*
22
+
23
+ # Build argument to control whether we're building standalone or in-repo
24
+ ARG BUILD_MODE=in-repo
25
+ ARG ENV_NAME=cloud_queue_env
26
+
27
+ # Copy environment code (always at root of build context)
28
+ COPY . /app/env
29
+
30
+ # For in-repo builds, openenv is already vendored in the build context
31
+ # For standalone builds, openenv will be installed via pyproject.toml
32
+ WORKDIR /app/env
33
+
34
+ # Ensure uv is available (for local builds where base image lacks it)
35
+ RUN if ! command -v uv >/dev/null 2>&1; then \
36
+ curl -LsSf https://astral.sh/uv/install.sh | sh && \
37
+ mv /root/.local/bin/uv /usr/local/bin/uv && \
38
+ mv /root/.local/bin/uvx /usr/local/bin/uvx; \
39
+ fi
40
+
41
+ # Install dependencies using uv sync
42
+ # If uv.lock exists, use it; otherwise resolve on the fly
43
+ RUN --mount=type=cache,target=/root/.cache/uv \
44
+ if [ -f uv.lock ]; then \
45
+ uv sync --frozen --no-install-project --no-editable; \
46
+ else \
47
+ uv sync --no-install-project --no-editable; \
48
+ fi
49
+
50
+ RUN --mount=type=cache,target=/root/.cache/uv \
51
+ if [ -f uv.lock ]; then \
52
+ uv sync --frozen --no-editable; \
53
+ else \
54
+ uv sync --no-editable; \
55
+ fi
56
+
57
+ # Final runtime stage
58
+ FROM ${BASE_IMAGE}
59
+
60
+ WORKDIR /app
61
+
62
+ # Copy the virtual environment from builder
63
+ COPY --from=builder /app/env/.venv /app/.venv
64
+
65
+ # Copy the environment code
66
+ COPY --from=builder /app/env /app/env
67
+
68
+ # Set PATH to use the virtual environment
69
+ ENV PATH="/app/.venv/bin:$PATH"
70
+
71
+ # Set PYTHONPATH so imports work correctly
72
+ ENV PYTHONPATH="/app/env:$PYTHONPATH"
73
+
74
+ # Health check
75
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
76
+ CMD curl -f http://localhost:8000/health || exit 1
77
+
78
+ # Run the FastAPI server
79
+ # The module path is constructed to work with the /app/env structure
80
+ ENV ENABLE_WEB_INTERFACE=true
81
+ CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
HIGH_SEVERITY_ANALYSIS.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Cloud Queue Env - High Severity Analysis (Updated)
2
+
3
+ Date: 2026-04-12
4
+
5
+ This note captures the two highest-impact issues still present in the environment logic.
6
+
7
+ ## 1) Arrival Modeling and Arrival Metrics Mismatch
8
+
9
+ Files and lines:
10
+ - cloud_queue_env/server/cloud_queue_env_environment.py:240
11
+ - cloud_queue_env/server/cloud_queue_env_environment.py:241
12
+ - cloud_queue_env/server/cloud_queue_env_environment.py:248
13
+ - cloud_queue_env/server/cloud_queue_env_environment.py:259
14
+
15
+ What happens now:
16
+ - The simulator samples Poisson arrivals each step.
17
+ - If sampled arrivals are greater than 1, the code still creates only one incoming job object.
18
+ - The arrivals metric is incremented by 1.0, not by sampled arrival count.
19
+
20
+ Why this is high severity:
21
+ - Burst behavior is compressed into a single-event stream, so load spikes are underrepresented.
22
+ - Several business metrics and grader components become biased (rejections, abandonment, SLA pressure).
23
+ - Policy ranking can drift because the environment under-penalizes burst scenarios.
24
+
25
+ Impact on benchmark credibility:
26
+ - High. This directly affects realism, fairness of grading, and reproducibility quality claims.
27
+
28
+ Recommended fix direction:
29
+ - Track all sampled arrivals each step.
30
+ - Either queue all arrivals or maintain an explicit backlog of pending incoming jobs.
31
+ - Increment arrivals metric using true sampled count.
32
+
33
+ ## 2) Agent Dispatch Control Is Partially Bypassed by Autodispatch
34
+
35
+ Files and lines:
36
+ - cloud_queue_env/server/cloud_queue_env_environment.py:353
37
+ - cloud_queue_env/server/cloud_queue_env_environment.py:391
38
+ - cloud_queue_env/server/cloud_queue_env_environment.py:738
39
+
40
+ What happens now:
41
+ - The agent may choose an action that is not dispatch.
42
+ - After action application, the environment still runs autodispatch and moves work to idle servers.
43
+
44
+ Why this is high severity:
45
+ - It weakens action-to-outcome causality for dispatch decisions.
46
+ - A policy can look better than it should because server assignment still happens automatically.
47
+ - It reduces benchmark difficulty in exactly the control surface the task is evaluating.
48
+
49
+ Impact on benchmark credibility:
50
+ - High. This can alter policy comparisons and invalidate assumptions about explicit control.
51
+
52
+ Recommended fix direction:
53
+ - Make dispatch behavior explicit by mode:
54
+ - strict-control mode: only agent dispatches.
55
+ - assisted mode: autodispatch on, but document this clearly and score accordingly.
56
+ - Keep one consistent mode for official benchmark scoring.
57
+
58
+ ## Priority Summary
59
+
60
+ 1. Fix arrival accounting and multi-arrival handling first.
61
+ 2. Fix dispatch authority semantics second.
62
+
63
+ Both should be addressed before claiming benchmark-grade reliability.
IMPLEMENTATION_ROADMAP.md ADDED
@@ -0,0 +1,272 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # QueueOps OpenEnv Implementation Roadmap
2
+
3
+ This file is the execution reference for building and iterating the queue operations environment.
4
+
5
+ Scope constraints:
6
+ - Keep current repository structure unchanged.
7
+ - Use cloud_queue_env as the project root.
8
+ - Follow OpenEnv compliance strictly.
9
+ - Provide deterministic graders with partial scores in [0, 1].
10
+ - Keep at least 3 benchmark tasks (easy, medium, hard).
11
+
12
+ ---
13
+
14
+ ## V1 - MVP Submission Build
15
+
16
+ Goal: ship a complete, valid benchmark that can be submitted.
17
+
18
+ ### Phase 1 - Environment Core
19
+ Sub-goals:
20
+ 1. Replace template echo behavior with queue simulator dynamics.
21
+ 2. Implement deterministic state transitions using explicit seeds.
22
+ 3. Implement terminal conditions with fixed task horizons.
23
+ 4. Keep OpenEnv contract: reset, step, state.
24
+
25
+ Exit criteria:
26
+ 1. reset/step/state are stable and deterministic for fixed seed + fixed action trace.
27
+ 2. Episodes terminate correctly.
28
+
29
+ ### Phase 2 - Task Pack (Easy/Medium/Hard)
30
+ Sub-goals:
31
+ 1. Add task selector and fixed per-task configs.
32
+ 2. Easy: single queue with admission/dispatch control.
33
+ 3. Medium: multi-server with class-aware routing.
34
+ 4. Hard: two-stage queue network with scaling decisions.
35
+
36
+ Exit criteria:
37
+ 1. All three tasks run end-to-end.
38
+ 2. Difficulty progression is visible from easy to hard.
39
+
40
+ ### Phase 3 - Deterministic Graders
41
+ Sub-goals:
42
+ 1. Implement per-task score equations with partial credit.
43
+ 2. Clamp all task scores to [0, 1].
44
+ 3. Handle edge cases (NaN/Inf/missing metrics) safely.
45
+ 4. Add final aggregate score across tasks.
46
+
47
+ Exit criteria:
48
+ 1. Same seeds and same actions always produce the same score.
49
+ 2. Scores are interpretable and bounded.
50
+
51
+ ### Phase 4 - Reward Shaping
52
+ Sub-goals:
53
+ 1. Add dense multi-component rewards (wait, throughput, SLA, cost, fairness, safety).
54
+ 2. Penalize invalid and exploit-like behavior.
55
+ 3. Keep reward scale bounded and stable.
56
+ 4. Expose component breakdown in metadata/info.
57
+
58
+ Exit criteria:
59
+ 1. Reward changes across trajectory (not terminal-only).
60
+ 2. Unsafe behavior is consistently penalized.
61
+
62
+ ### Phase 5 - Inference Runner
63
+ Sub-goals:
64
+ 1. Run all benchmark tasks with fixed seeds.
65
+ 2. Use OpenAI-compatible client with provider credentials from env variables.
66
+ 3. Emit [START], [STEP], [END] logs and final [SUMMARY].
67
+ 4. Keep runs reproducible (fixed model params).
68
+
69
+ Exit criteria:
70
+ 1. End-to-end benchmark run works locally and on deployed runtime.
71
+ 2. Output format is submission-ready.
72
+
73
+ ### Phase 6 - Validation and Docs
74
+ Sub-goals:
75
+ 1. Pass openenv validate.
76
+ 2. Ensure Docker build/run path works.
77
+ 3. Update README with task, reward, grading, and baseline usage.
78
+ 4. Add sample benchmark output snippet for evidence.
79
+
80
+ Exit criteria:
81
+ 1. Validation passes.
82
+ 2. README is complete for judges and users.
83
+
84
+ ### V1 Submission Gate
85
+ All items must be true:
86
+ 1. Three tasks implemented and deterministic.
87
+ 2. Graders produce valid partial scores in [0, 1].
88
+ 3. Inference script runs all tasks and reports summary.
89
+ 4. OpenEnv validation passes.
90
+ 5. Deployment path is functional.
91
+
92
+ ---
93
+
94
+ ## V2 - Robustness and Quality Upgrade
95
+
96
+ Goal: improve reliability, calibration, and benchmark trustworthiness.
97
+
98
+ ### Phase 1 - Determinism Hardening
99
+ Sub-goals:
100
+ 1. Separate RNG streams for arrivals/service/abandonment/shocks.
101
+ 2. Add replay trace mode for debugging.
102
+ 3. Add deterministic episode metadata for audits.
103
+
104
+ ### Phase 2 - Difficulty Calibration
105
+ Sub-goals:
106
+ 1. Tune easy/medium/hard parameter separation.
107
+ 2. Improve anti-exploit balancing (reject-all, noop loops, over-scaling).
108
+ 3. Re-check reward and grade alignment across seeds.
109
+
110
+ ### Phase 3 - Reporting Upgrade
111
+ Sub-goals:
112
+ 1. Add per-seed result table.
113
+ 2. Add mean/std and confidence summary.
114
+ 3. Add failure/invalid-action diagnostics in summary.
115
+
116
+ ### V2 Exit Criteria
117
+ 1. Lower variance for fixed seed sets.
118
+ 2. Clearer task progression and fairer scoring.
119
+ 3. Better debugging and reproducibility outputs.
120
+
121
+ ---
122
+
123
+ ## V3 - Extended Benchmark Pack
124
+
125
+ Goal: increase novelty and long-term benchmark value.
126
+
127
+ ### Phase 1 - Optional Task D
128
+ Sub-goals:
129
+ 1. Add stronger non-stationary demand patterns.
130
+ 2. Grade robustness to bursts and demand shifts.
131
+
132
+ ### Phase 2 - Optional Task E
133
+ Sub-goals:
134
+ 1. Add partial observability/noisy delayed metrics.
135
+ 2. Grade safe decision-making under uncertainty.
136
+
137
+ ### Phase 3 - Public Benchmarking Bundle
138
+ Sub-goals:
139
+ 1. Publish official seed suites and profiles (quick/standard/full).
140
+ 2. Provide reference baseline runs.
141
+ 3. Provide reproducibility notes for external users.
142
+
143
+ ### V3 Exit Criteria
144
+ 1. Four or more tasks available.
145
+ 2. Stronger novelty and benchmark coverage.
146
+ 3. Cleaner external benchmarking workflow.
147
+
148
+ ---
149
+
150
+ ## Recommended Execution Order
151
+
152
+ 1. Complete V1 and submit.
153
+ 2. Upgrade to V2 for reliability and scoring quality.
154
+ 3. Add V3 only if timeline permits.
155
+
156
+ ## Current Status Snapshot
157
+
158
+ 1. V1 core implementation is in place and running.
159
+ 2. openenv validate has passed.
160
+ 3. V2 determinism hardening, calibration pass, and reporting upgrade are implemented.
161
+ 4. Current focus shifts to V3 extensions and benchmark quality tuning.
162
+
163
+ ## V2 Completion Notes
164
+
165
+ Implemented outcomes:
166
+ 1. Separate RNG streams are active for arrivals, service, abandonment, and exogenous effects.
167
+ 2. Deterministic trace metadata is exposed (`trace_digest`, `seed`, and RNG stream seeds).
168
+ 3. Anti-exploit reward calibration includes rejection-heavy and harmful downscale penalties.
169
+ 4. Inference supports multi-seed reporting with mean/std/ci95 outputs.
170
+ 5. Inference supports replay-mode action traces via file input for deterministic debugging.
171
+ 6. Inference supports JSON/CSV report export for per-seed analysis.
172
+
173
+ ---
174
+
175
+ ## Requirement Coverage Matrix (From requirementInfo.md)
176
+
177
+ This section is the final compliance tracker for judging criteria.
178
+
179
+ ### Functional Requirements
180
+
181
+ 1. Real-world task simulation
182
+ - Requirement: Must represent real human operational work, not toy behavior.
183
+ - Implementation target: queue operations in call center/cloud/logistics-style flow.
184
+ - Evidence to keep: README motivation + task descriptions + action semantics.
185
+ - Status: in progress (core done, examples and narrative should be strengthened).
186
+
187
+ 2. OpenEnv spec compliance
188
+ - Requirement: typed models, reset, step(action), state, openenv.yaml, validate pass.
189
+ - Implementation target: models.py + server environment + openenv.yaml + app entrypoint.
190
+ - Evidence to keep: `openenv validate` output in PR notes/README.
191
+ - Status: done (validate passing).
192
+
193
+ 3. Minimum 3 tasks with deterministic graders
194
+ - Requirement: at least easy/medium/hard, deterministic 0.0-1.0 grading.
195
+ - Implementation target: task configs + per-task scoring formulas + clamping.
196
+ - Evidence to keep: sample run showing all tasks and deterministic seeds.
197
+ - Status: done for 3 tasks, polish recommended for calibration.
198
+
199
+ 4. Meaningful reward function
200
+ - Requirement: dense trajectory signal + penalties for undesirable behavior.
201
+ - Implementation target: weighted reward components and safety penalties.
202
+ - Evidence to keep: reward component logging in metadata and README equations.
203
+ - Status: done, tune weights in V2.
204
+
205
+ 5. Baseline inference script
206
+ - Requirement: OpenAI-compatible client, env vars credentials, reproducible score over tasks.
207
+ - Implementation target: fixed tasks/seeds/model params, required log format.
208
+ - Evidence to keep: saved run logs and summary scores.
209
+ - Status: done, provider-fallback robustness can be improved.
210
+
211
+ ### Non-Functional Requirements
212
+
213
+ 1. Hugging Face Space deployment
214
+ - Requirement: containerized HF Space tagged openenv.
215
+ - Evidence to keep: Space URL + successful run proof.
216
+ - Status: done.
217
+
218
+ 2. Containerized execution
219
+ - Requirement: Dockerfile works with build + run.
220
+ - Evidence to keep: commands and successful output snippet.
221
+ - Status: pending explicit evidence capture in docs.
222
+
223
+ 3. Documentation completeness
224
+ - Requirement: README includes env motivation, spaces, tasks, setup/usage, baseline scores.
225
+ - Evidence to keep: README sections + benchmark output table.
226
+ - Status: mostly done, baseline score table still needed.
227
+
228
+ ---
229
+
230
+ ## Evaluation Criteria Coverage Checklist
231
+
232
+ ### Real-world utility (30%)
233
+ 1. Keep README examples tied to concrete real operations scenarios.
234
+ 2. Add one paragraph on why this benchmark is useful for agent evaluation.
235
+
236
+ ### Task and grader quality (25%)
237
+ 1. Keep deterministic seed set fixed and documented.
238
+ 2. Show per-task scoring decomposition and bounded outputs.
239
+ 3. Add one reproducibility check note: same seed + same policy => same score.
240
+
241
+ ### Environment design (20%)
242
+ 1. Verify clean reset and sensible done boundaries for all tasks.
243
+ 2. Keep action/observation schema stable and documented.
244
+ 3. Keep dense reward with interpretable components.
245
+
246
+ ### Code quality and spec compliance (15%)
247
+ 1. Keep `openenv validate` passing.
248
+ 2. Capture docker build/run commands and outcomes.
249
+ 3. Keep deployment and ws route functional.
250
+
251
+ ### Creativity and novelty (10%)
252
+ 1. Emphasize queue-control benchmark novelty in README.
253
+ 2. Keep multi-objective reward and cost/fairness tradeoff visible.
254
+
255
+ ---
256
+
257
+ ## Pre-Submission Evidence Pack (Must Attach)
258
+
259
+ 1. Validation proof
260
+ - `openenv validate` success output.
261
+
262
+ 2. Runtime proof
263
+ - HF Space URL and one successful task run excerpt.
264
+
265
+ 3. Baseline proof
266
+ - One full [START]/[STEP]/[END]/[SUMMARY] run log.
267
+
268
+ 4. Docker proof
269
+ - `docker build` and `docker run` command results.
270
+
271
+ 5. Documentation proof
272
+ - README includes baseline score table (easy, medium, hard, final).
README.md CHANGED
@@ -1,10 +1,369 @@
1
  ---
2
- title: Cloud Queue Env
3
- emoji: 😻
4
- colorFrom: red
5
- colorTo: purple
6
  sdk: docker
7
  pinned: false
 
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Cloud Queue Env Environment Server
3
+ emoji: 🖨️
4
+ colorFrom: pink
5
+ colorTo: blue
6
  sdk: docker
7
  pinned: false
8
+ app_port: 8000
9
+ base_path: /web
10
+ tags:
11
+ - openenv
12
  ---
13
 
14
+ # Cloud Queue Env Environment
15
+
16
+ A real-world queue-operations benchmark for OpenEnv.
17
+
18
+ This environment simulates service operations decisions humans make in production systems:
19
+ - Admission and rejection under load
20
+ - Queue routing and dispatching
21
+ - Priority handling for urgent traffic
22
+ - Capacity scaling under infrastructure cost constraints
23
+
24
+ The benchmark includes three deterministic tasks with partial graders in [0, 1]:
25
+ - easy: single-queue stability
26
+ - medium: multi-server priority routing
27
+ - hard: two-stage queue network with scaling
28
+
29
+ ## Quick Start
30
+
31
+ Use the CloudQueueEnv client to connect to a running server or container:
32
+
33
+ ```python
34
+ from cloud_queue_env import CloudQueueAction, CloudQueueEnv
35
+
36
+ try:
37
+ env = CloudQueueEnv.from_docker_image("cloud_queue_env-env:latest")
38
+
39
+ # Configure task + seed, then reset into that deterministic episode
40
+ env.reset()
41
+ env.step(CloudQueueAction(action_type="configure_task", task_id="easy", seed=11))
42
+ result = env.reset()
43
+
44
+ for _ in range(20):
45
+ obs = result.observation
46
+ if obs.incoming_job_present:
47
+ action = CloudQueueAction(action_type="admit", target_queue=0)
48
+ else:
49
+ action = CloudQueueAction(action_type="dispatch", target_queue=0)
50
+
51
+ result = env.step(action)
52
+ print(
53
+ f"step={obs.sim_time} queues={obs.queue_lengths} "
54
+ f"reward={result.reward:.3f} done={result.done}"
55
+ )
56
+ if result.done:
57
+ break
58
+
59
+ final_score = result.observation.metadata.get("episode_score", 0.0)
60
+ print(f"episode_score={final_score:.3f}")
61
+
62
+ finally:
63
+ env.close()
64
+ ```
65
+
66
+ The CloudQueueEnv.from_docker_image() method handles:
67
+ - Starting the Docker container
68
+ - Waiting for the server to be ready
69
+ - Connecting to the environment
70
+ - Container cleanup when you call `close()`
71
+
72
+ ## Building the Docker Image
73
+
74
+ Before using the environment, you need to build the Docker image:
75
+
76
+ ```bash
77
+ # From project root
78
+ docker build -t cloud_queue_env-env:latest -f server/Dockerfile .
79
+ ```
80
+
81
+ ## Deploying to Hugging Face Spaces
82
+
83
+ You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
84
+
85
+ ```bash
86
+ # From the environment directory (where openenv.yaml is located)
87
+ openenv push
88
+
89
+ # Or specify options
90
+ openenv push --namespace my-org --private
91
+ ```
92
+
93
+ The `openenv push` command will:
94
+ 1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
95
+ 2. Prepare a custom build for Hugging Face Docker space (enables web interface)
96
+ 3. Upload to Hugging Face (ensuring you're logged in)
97
+
98
+ ### Prerequisites
99
+
100
+ - Authenticate with Hugging Face: The command will prompt for login if not already authenticated
101
+
102
+ ### Options
103
+
104
+ - `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
105
+ - `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
106
+ - `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
107
+ - `--private`: Deploy the space as private (default: public)
108
+
109
+ ### Examples
110
+
111
+ ```bash
112
+ # Push to your personal namespace (defaults to username/env-name from openenv.yaml)
113
+ openenv push
114
+
115
+ # Push to a specific repository
116
+ openenv push --repo-id my-org/my-env
117
+
118
+ # Push with a custom base image
119
+ openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
120
+
121
+ # Push as a private space
122
+ openenv push --private
123
+
124
+ # Combine options
125
+ openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
126
+ ```
127
+
128
+ After deployment, your space will be available at:
129
+ `https://huggingface.co/spaces/<repo-id>`
130
+
131
+ The deployed space includes:
132
+ - **Web Interface** at `/web` - Interactive UI for exploring the environment
133
+ - **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
134
+ - **Health Check** at `/health` - Container health monitoring
135
+ - **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
136
+
137
+ ## Environment Details
138
+
139
+ ### Action
140
+ CloudQueueAction fields:
141
+ - action_type: one of configure_task, admit, reject, route, dispatch, scale, reprioritize, noop
142
+ - target_queue: queue index for route/dispatch/admit
143
+ - target_server: optional server index
144
+ - scale_delta: server delta for scale action
145
+ - new_priority: new priority value for reprioritize
146
+ - task_id: easy/medium/hard (used with configure_task)
147
+ - seed: deterministic task seed (used with configure_task)
148
+
149
+ ### Observation
150
+ CloudQueueObservation includes:
151
+ - task_id, sim_time, horizon
152
+ - queue_lengths, queue_wait_ema
153
+ - server_busy, server_remaining_service, utilization
154
+ - incoming_job_present, incoming_job_size, incoming_job_priority, incoming_job_deadline, incoming_job_type
155
+ - sla_violation_rate, abandonment_rate, throughput_recent, energy_cost_rate
156
+ - level, optional_history, action_mask
157
+ - reward, done, metadata
158
+
159
+ ### Reward
160
+ Per-step reward is dense and multi-objective:
161
+
162
+ $$
163
+ r_t = 0.35R_{wait} + 0.20R_{throughput} + 0.20R_{sla} + 0.15R_{cost} + 0.05R_{fair} + 0.05R_{safe}
164
+ $$
165
+
166
+ Properties:
167
+ - Partial progress signal over the full trajectory
168
+ - Penalties for invalid actions and unsafe/noop behavior under congestion
169
+ - Bounded reward values for stability
170
+
171
+ ### Deterministic Graders
172
+ Each task returns a deterministic episode_score in [0, 1], stored in observation metadata.
173
+
174
+ - easy score uses avg wait, throughput, rejection rate, and SLA violations
175
+ - medium score uses urgent/normal p95 waits, urgent SLA, throughput, and action cost
176
+ - hard score uses end-to-end p95, abandonment, SLA, throughput, infra cost, and fairness gap
177
+
178
+ If invalid action rate exceeds threshold, score is capped.
179
+
180
+ ## Tasks
181
+
182
+ 1. easy (single queue stability)
183
+ - one queue, one server
184
+ - objective: low wait with acceptable throughput and low rejection
185
+
186
+ 2. medium (priority routing)
187
+ - two queues and multiple servers
188
+ - objective: protect urgent traffic while maintaining total performance
189
+
190
+ 3. hard (queue network + scaling)
191
+ - two-stage queue network with bursty arrivals and heavy-tailed service times
192
+ - objective: balance latency/SLA/abandonment against infra cost and fairness
193
+
194
+ ## Baseline Inference
195
+
196
+ Run baseline inference across easy/medium/hard:
197
+
198
+ ```bash
199
+ API_KEY=your_provider_key python inference.py
200
+ ```
201
+
202
+ Optional variables:
203
+ - API_KEY (OpenAI-compatible provider key for model calls)
204
+ - API_BASE_URL (default: https://router.huggingface.co/v1)
205
+ - MODEL_NAME (default: Qwen/Qwen2.5-72B-Instruct)
206
+ - BASE_URL (if using deployed space)
207
+ - IMAGE_NAME (if launching local docker image)
208
+ - USE_HEURISTIC_ONLY (true/false)
209
+ - DISABLE_MODEL_ON_FIRST_ERROR (true/false)
210
+ - MAX_STEPS_OVERRIDE (integer quick-test cap)
211
+ - TASK_SEEDS_JSON (JSON map for multi-seed runs)
212
+ - ACTION_TRACE_FILE (JSON replay file keyed by task:seed)
213
+ - REPORT_JSON_PATH (write seed/task report JSON)
214
+ - REPORT_CSV_PATH (write per-seed report CSV)
215
+
216
+ Output includes required line types:
217
+ - [START]
218
+ - [STEP]
219
+ - [END]
220
+
221
+ And final aggregate summary:
222
+ - [SUMMARY] easy=<...> medium=<...> hard=<...> final=<...>
223
+
224
+ V2 reporting also includes:
225
+ - [REPORT_SEED] task=<task_id> seed=<seed> score=<score> steps=<n> trace=<digest>
226
+ - [REPORT] task=<task_id> seeds=<n> mean=<score> std=<score> ci95=<score>
227
+
228
+ ## Baseline Scores
229
+
230
+ Current reproducible heuristic-only baseline (deployed runtime, single seed per task):
231
+
232
+ | Task | Seed Count | Mean Score |
233
+ |---|---:|---:|
234
+ | easy | 1 | 0.000 |
235
+ | medium | 1 | 0.000 |
236
+ | hard | 1 | 0.000 |
237
+ | final (mean of task means) | - | 0.000 |
238
+
239
+ Notes:
240
+ - These values are from heuristic fallback mode and are expected to be low.
241
+ - Model-based scores depend on provider/model availability and should be recorded from a successful funded run.
242
+ - Keep this table updated with your latest official benchmark run before final submission.
243
+
244
+ ## Advanced Usage
245
+
246
+ ### Connecting to an Existing Server
247
+
248
+ If you already have a Cloud Queue Env environment server running, you can connect directly:
249
+
250
+ ```python
251
+ from cloud_queue_env import CloudQueueAction, CloudQueueEnv
252
+
253
+ # Connect to existing server
254
+ cloud_queue_envenv = CloudQueueEnv(base_url="<ENV_HTTP_URL_HERE>")
255
+
256
+ # Use as normal
257
+ result = cloud_queue_envenv.reset()
258
+ result = cloud_queue_envenv.step(CloudQueueAction(action_type="dispatch", target_queue=0))
259
+ ```
260
+
261
+ Note: When connecting to an existing server, `cloud_queue_envenv.close()` will NOT stop the server.
262
+
263
+ ### Using the Context Manager
264
+
265
+ The client supports context manager usage for automatic connection management:
266
+
267
+ ```python
268
+ from cloud_queue_env import CloudQueueAction, CloudQueueEnv
269
+
270
+ # Connect with context manager (auto-connects and closes)
271
+ with CloudQueueEnv(base_url="http://localhost:8000") as env:
272
+ result = env.reset()
273
+ print(f"Initial queues: {result.observation.queue_lengths}")
274
+ # Multiple steps with low latency
275
+ for _ in range(10):
276
+ result = env.step(CloudQueueAction(action_type="noop"))
277
+ print(f"Reward: {result.reward:.3f}")
278
+ ```
279
+
280
+ The client uses WebSocket connections for:
281
+ - **Lower latency**: No HTTP connection overhead per request
282
+ - **Persistent session**: Server maintains your environment state
283
+ - **Efficient for episodes**: Better for many sequential steps
284
+
285
+ ### Concurrent WebSocket Sessions
286
+
287
+ The server supports multiple concurrent WebSocket connections. To enable this,
288
+ modify `server/app.py` to use factory mode:
289
+
290
+ ```python
291
+ # In server/app.py - use factory mode for concurrent sessions
292
+ app = create_app(
293
+ CloudQueueEnvironment, # Pass class, not instance
294
+ CloudQueueAction,
295
+ CloudQueueObservation,
296
+ max_concurrent_envs=4, # Allow 4 concurrent sessions
297
+ )
298
+ ```
299
+
300
+ Then multiple clients can connect simultaneously:
301
+
302
+ ```python
303
+ from cloud_queue_env import CloudQueueAction, CloudQueueEnv
304
+ from concurrent.futures import ThreadPoolExecutor
305
+
306
+ def run_episode(client_id: int):
307
+ with CloudQueueEnv(base_url="http://localhost:8000") as env:
308
+ result = env.reset()
309
+ for i in range(10):
310
+ result = env.step(CloudQueueAction(action_type="dispatch", target_queue=i % 2))
311
+ return client_id, result.observation.queue_lengths
312
+
313
+ # Run 4 episodes concurrently
314
+ with ThreadPoolExecutor(max_workers=4) as executor:
315
+ results = list(executor.map(run_episode, range(4)))
316
+ ```
317
+
318
+ ## Development & Testing
319
+
320
+ ### Direct Environment Testing
321
+
322
+ Core files:
323
+ - models: typed action/observation schema
324
+ - server environment: queue simulation, reward shaping, grading
325
+ - inference script: task sweep and benchmark logging
326
+
327
+ ### Running Locally
328
+
329
+ Run the server locally for development:
330
+
331
+ ```bash
332
+ uvicorn server.app:app --reload
333
+ ```
334
+
335
+ ## Project Structure
336
+
337
+ ```
338
+ cloud_queue_env/
339
+ ├── .dockerignore
340
+ ├── __init__.py
341
+ ├── README.md
342
+ ├── openenv.yaml
343
+ ├── pyproject.toml
344
+ ├── client.py
345
+ ├── models.py
346
+ ├── inference.py
347
+ ├── IMPLEMENTATION_ROADMAP.md
348
+ └── server/
349
+ ├── __init__.py
350
+ ├── cloud_queue_env_environment.py
351
+ ├── app.py
352
+ └── Dockerfile
353
+ ```
354
+
355
+ TASK A — Easy (150 steps)
356
+ Scenario: 1 queue, 1 server (M/M/1), only admit/reject/dispatch
357
+ Objective: Keep wait low while processing throughput
358
+ Grader: score = 0.40×(1-avg_wait/6) + 0.30×(throughput/70)
359
+ + 0.15×(1-rejection_rate/0.3) + 0.15×(1-sla_breaches/0.3)
360
+ TASK B — Medium (200 steps)
361
+ Scenario: 2 queues, 3 servers, 28% urgent jobs → route + reprioritize
362
+ Objective: Protect urgent SLA while not starving normal jobs
363
+ Grader: score = 0.35×urgent_wait_score + 0.25×urgent_sla_score
364
+ + 0.15×normal_wait_score + 0.15×throughput + 0.10×cost
365
+ TASK C — Hard (250 steps)
366
+ Scenario: 2-stage pipeline, 1–6 servers, heavy-tail service, abandonments
367
+ Objective: Maximize quality under budget with fairness
368
+ Grader: score = 0.25×e2e_latency + 0.20×abandonment + 0.20×sla
369
+ + 0.15×throughput + 0.10×cost + 0.10×fairness
__init__.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Cloud Queue Env Environment."""
8
+
9
+ from .client import CloudQueueEnv
10
+ from .models import CloudQueueAction, CloudQueueObservation
11
+
12
+ __all__ = [
13
+ "CloudQueueAction",
14
+ "CloudQueueObservation",
15
+ "CloudQueueEnv",
16
+ ]
client.py ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Cloud Queue Env Environment Client."""
8
+
9
+ from typing import Dict
10
+
11
+ from openenv.core import EnvClient
12
+ from openenv.core.client_types import StepResult
13
+ from openenv.core.env_server.types import State
14
+
15
+ from .models import CloudQueueAction, CloudQueueObservation
16
+
17
+
18
+ class CloudQueueEnv(
19
+ EnvClient[CloudQueueAction, CloudQueueObservation, State]
20
+ ):
21
+ """
22
+ Client for the Cloud Queue Env Environment.
23
+
24
+ This client maintains a persistent WebSocket connection to the environment server,
25
+ enabling efficient multi-step interactions with lower latency.
26
+ Each client instance has its own dedicated environment session on the server.
27
+
28
+ Example:
29
+ >>> # Connect to a running server
30
+ >>> with CloudQueueEnv(base_url="http://localhost:8000") as client:
31
+ ... result = client.reset()
32
+ ... print(result.observation.queue_lengths)
33
+ ...
34
+ ... result = client.step(CloudQueueAction(action_type="admit", target_queue=0))
35
+ ... print(result.observation.throughput_recent)
36
+
37
+ Example with Docker:
38
+ >>> # Automatically start container and connect
39
+ >>> client = CloudQueueEnv.from_docker_image("cloud_queue_env-env:latest")
40
+ >>> try:
41
+ ... result = client.reset()
42
+ ... result = client.step(CloudQueueAction(action_type="dispatch", target_queue=0))
43
+ ... finally:
44
+ ... client.close()
45
+ """
46
+
47
+ def _step_payload(self, action: CloudQueueAction) -> Dict:
48
+ """
49
+ Convert CloudQueueAction to JSON payload for step message.
50
+
51
+ Args:
52
+ action: CloudQueueAction instance
53
+
54
+ Returns:
55
+ Dictionary representation suitable for JSON encoding
56
+ """
57
+ return {
58
+ "action_type": action.action_type,
59
+ "target_queue": action.target_queue,
60
+ "target_server": action.target_server,
61
+ "scale_delta": action.scale_delta,
62
+ "new_priority": action.new_priority,
63
+ "task_id": action.task_id,
64
+ "seed": action.seed,
65
+ }
66
+
67
+ def _parse_result(self, payload: Dict) -> StepResult[CloudQueueObservation]:
68
+ """
69
+ Parse server response into StepResult[CloudQueueObservation].
70
+
71
+ Args:
72
+ payload: JSON response data from server
73
+
74
+ Returns:
75
+ StepResult with CloudQueueObservation
76
+ """
77
+ obs_data = payload.get("observation", {})
78
+ observation = CloudQueueObservation(
79
+ task_id=obs_data.get("task_id", "easy"),
80
+ sim_time=obs_data.get("sim_time", 0),
81
+ horizon=obs_data.get("horizon", 0),
82
+ queue_lengths=obs_data.get("queue_lengths", []),
83
+ queue_wait_ema=obs_data.get("queue_wait_ema", []),
84
+ server_busy=obs_data.get("server_busy", []),
85
+ server_remaining_service=obs_data.get("server_remaining_service", []),
86
+ utilization=obs_data.get("utilization", []),
87
+ incoming_job_present=obs_data.get("incoming_job_present", False),
88
+ incoming_job_size=obs_data.get("incoming_job_size", 0.0),
89
+ incoming_job_priority=obs_data.get("incoming_job_priority", 0),
90
+ incoming_job_deadline=obs_data.get("incoming_job_deadline", 0.0),
91
+ incoming_job_type=obs_data.get("incoming_job_type", 0),
92
+ sla_violation_rate=obs_data.get("sla_violation_rate", 0.0),
93
+ abandonment_rate=obs_data.get("abandonment_rate", 0.0),
94
+ throughput_recent=obs_data.get("throughput_recent", 0.0),
95
+ energy_cost_rate=obs_data.get("energy_cost_rate", 0.0),
96
+ level=obs_data.get("level", 1.0),
97
+ optional_history=obs_data.get("optional_history", []),
98
+ action_mask=obs_data.get("action_mask", []),
99
+ done=payload.get("done", False),
100
+ reward=payload.get("reward"),
101
+ metadata=obs_data.get("metadata", {}),
102
+ )
103
+
104
+ return StepResult(
105
+ observation=observation,
106
+ reward=payload.get("reward"),
107
+ done=payload.get("done", False),
108
+ )
109
+
110
+ def _parse_state(self, payload: Dict) -> State:
111
+ """
112
+ Parse server response into State object.
113
+
114
+ Args:
115
+ payload: JSON response from state request
116
+
117
+ Returns:
118
+ State object with episode_id and step_count
119
+ """
120
+ return State(
121
+ episode_id=payload.get("episode_id"),
122
+ step_count=payload.get("step_count", 0),
123
+ )
inference.py ADDED
@@ -0,0 +1,747 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Baseline inference runner for the queue operations benchmark tasks."""
2
+
3
+ import asyncio
4
+ import csv
5
+ import json
6
+ import os
7
+ import statistics
8
+ import textwrap
9
+ from typing import Any, List, Optional
10
+ from urllib.parse import urlparse, urlunparse
11
+
12
+ from dotenv import load_dotenv
13
+ from openai import OpenAI
14
+
15
+ load_dotenv() # Load environment variables from .env file
16
+
17
+ from cloud_queue_env import CloudQueueAction, CloudQueueEnv, CloudQueueObservation
18
+
19
+
20
+ IMAGE_NAME = os.getenv("IMAGE_NAME")
21
+ BASE_URL = os.getenv("BASE_URL")
22
+
23
+
24
+ API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
25
+ MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
26
+
27
+ API_KEY = os.getenv("API_KEY") or os.getenv("HF_TOKEN")
28
+
29
+ BENCHMARK = os.getenv("BENCHMARK", "queueops-openenv")
30
+ TASKS = ["easy", "medium", "hard"]
31
+ TASK_SEEDS_JSON = os.getenv("TASK_SEEDS_JSON")
32
+ SEEDS = [11, 23, 37]
33
+ TEMPERATURE = 0.2
34
+ MAX_TOKENS = 180
35
+ SUCCESS_SCORE_THRESHOLD = 0.60
36
+ USE_HEURISTIC_ONLY = os.getenv("USE_HEURISTIC_ONLY", "false").lower() in {"1", "true", "yes"}
37
+ DISABLE_MODEL_ON_FIRST_ERROR = os.getenv("DISABLE_MODEL_ON_FIRST_ERROR", "true").lower() in {"1", "true", "yes"}
38
+ MAX_STEPS_OVERRIDE = int(os.getenv("MAX_STEPS_OVERRIDE", "0") or "0")
39
+ ACTION_TRACE_FILE = os.getenv("ACTION_TRACE_FILE")
40
+ REPORT_JSON_PATH = os.getenv("REPORT_JSON_PATH")
41
+ REPORT_CSV_PATH = os.getenv("REPORT_CSV_PATH")
42
+
43
+ SYSTEM_PROMPT = textwrap.dedent(
44
+ """
45
+ You are an agent controlling a cloud queue scheduling environment.
46
+ Your goal: minimize wait times, SLA violations, and cost while maximizing throughput.
47
+
48
+ ACTIONS (return exactly one JSON object, no extra text):
49
+ {"action_type": "admit", "target_queue": 0} — accept incoming job into queue 0
50
+ {"action_type": "route", "target_queue": 1} — accept incoming job into queue 1 (medium/hard only)
51
+ {"action_type": "reject", "target_queue": null} — reject incoming job (use when queues are filling up)
52
+ {"action_type": "dispatch", "target_queue": 0} — move job from queue to an idle server
53
+ {"action_type": "reprioritize","new_priority": 2} — promote a normal job to urgent (medium/hard only)
54
+ {"action_type": "scale", "scale_delta": 1} — add 1 server (+1) or remove 1 server (-1) (hard only)
55
+ {"action_type": "noop", "target_queue": null} — do nothing
56
+
57
+ STRATEGY HINTS:
58
+ - REJECT jobs when queue fill is above 60% to prevent overflow and SLA breaches.
59
+ - ADMIT when queues have space and server is idle.
60
+ - DISPATCH after admitting to keep servers busy.
61
+ - On medium/hard: ROUTE urgent jobs (priority=2) to a less-loaded queue.
62
+ - On hard: SCALE up (+1) when queue_fill > 70% and cost allows; scale down when queues are empty.
63
+ - Negative reward means the system is struggling — change strategy.
64
+
65
+ Return ONLY valid JSON. No explanation.
66
+ """
67
+ ).strip()
68
+
69
+
70
+ ACTION_TYPES = (
71
+ "configure_task",
72
+ "admit",
73
+ "reject",
74
+ "route",
75
+ "dispatch",
76
+ "scale",
77
+ "reprioritize",
78
+ "noop",
79
+ )
80
+
81
+ TASK_ALLOWED_ACTIONS = {
82
+ "easy": {"admit", "reject", "dispatch", "noop"},
83
+ "medium": {"admit", "reject", "route", "dispatch", "reprioritize", "noop"},
84
+ "hard": {"admit", "reject", "route", "dispatch", "reprioritize", "scale", "noop"},
85
+ }
86
+
87
+ MODEL_ACTION_RESPONSE_FORMAT = {
88
+ "type": "json_schema",
89
+ "json_schema": {
90
+ "name": "cloud_queue_action",
91
+ "strict": True,
92
+ "schema": {
93
+ "type": "object",
94
+ "additionalProperties": False,
95
+ "required": [
96
+ "action_type",
97
+ "target_queue",
98
+ "target_server",
99
+ "scale_delta",
100
+ "new_priority",
101
+ ],
102
+ "properties": {
103
+ "action_type": {"type": "string", "enum": list(ACTION_TYPES)},
104
+ "target_queue": {"type": ["integer", "null"], "minimum": 0},
105
+ "target_server": {"type": ["integer", "null"], "minimum": 0},
106
+ "scale_delta": {"type": ["integer", "null"], "minimum": -2, "maximum": 2},
107
+ "new_priority": {"type": ["integer", "null"], "minimum": 0, "maximum": 3},
108
+ },
109
+ },
110
+ },
111
+ }
112
+
113
+ _SCHEMA_RESPONSE_FORMAT_FAILED = False
114
+
115
+
116
+ def log_start(task: str, env: str, model: str) -> None:
117
+ print(f"[START] task={task} env={env} model={model}", flush=True)
118
+
119
+
120
+ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
121
+ error_val = error if error else "null"
122
+ done_val = str(done).lower()
123
+ print(
124
+ f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
125
+ flush=True,
126
+ )
127
+
128
+
129
+ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
130
+ rewards_str = ",".join(f"{r:.2f}" for r in rewards)
131
+ print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
132
+
133
+
134
+ def parse_task_seed_map() -> dict[str, list[int]]:
135
+ if TASK_SEEDS_JSON:
136
+ try:
137
+ data = json.loads(TASK_SEEDS_JSON)
138
+ task_map: dict[str, list[int]] = {}
139
+ for task_name, seeds in data.items():
140
+ parsed = [int(s) for s in seeds]
141
+ if parsed:
142
+ task_map[str(task_name)] = parsed
143
+ if task_map:
144
+ return task_map
145
+ except Exception as exc:
146
+ print(f"[DEBUG] Invalid TASK_SEEDS_JSON, falling back to defaults: {exc}", flush=True)
147
+
148
+ return {
149
+ "easy": [SEEDS[0]],
150
+ "medium": [SEEDS[1]],
151
+ "hard": [SEEDS[2]],
152
+ }
153
+
154
+
155
+ def _action_from_dict(data: dict) -> CloudQueueAction:
156
+ return CloudQueueAction(
157
+ action_type=str(data.get("action_type", "noop")),
158
+ target_queue=data.get("target_queue"),
159
+ target_server=data.get("target_server"),
160
+ scale_delta=data.get("scale_delta"),
161
+ new_priority=data.get("new_priority"),
162
+ )
163
+
164
+
165
+ def load_replay_actions() -> dict[str, list[CloudQueueAction]]:
166
+ if not ACTION_TRACE_FILE:
167
+ return {}
168
+
169
+ try:
170
+ with open(ACTION_TRACE_FILE, "r", encoding="utf-8") as f:
171
+ payload = json.load(f)
172
+ except Exception as exc:
173
+ print(f"[DEBUG] Failed to load ACTION_TRACE_FILE: {exc}", flush=True)
174
+ return {}
175
+
176
+ replay: dict[str, list[CloudQueueAction]] = {}
177
+ if isinstance(payload, dict):
178
+ for key, action_list in payload.items():
179
+ if not isinstance(action_list, list):
180
+ continue
181
+ parsed = []
182
+ for item in action_list:
183
+ if isinstance(item, dict):
184
+ parsed.append(_action_from_dict(item))
185
+ if parsed:
186
+ replay[str(key)] = parsed
187
+ return replay
188
+
189
+
190
+ def ci95(values: list[float]) -> float:
191
+ if len(values) <= 1:
192
+ return 0.0
193
+ std = statistics.pstdev(values)
194
+ return 1.96 * std / (len(values) ** 0.5)
195
+
196
+
197
+ def write_reports(seed_rows: list[dict], task_score_table: dict[str, list[float]]) -> None:
198
+ if REPORT_JSON_PATH:
199
+ report_payload = {
200
+ "seed_rows": seed_rows,
201
+ "task_summary": {
202
+ task: {
203
+ "mean": statistics.mean(scores) if scores else 0.0,
204
+ "std": statistics.pstdev(scores) if len(scores) > 1 else 0.0,
205
+ "ci95": ci95(scores),
206
+ "count": len(scores),
207
+ }
208
+ for task, scores in task_score_table.items()
209
+ },
210
+ }
211
+ try:
212
+ with open(REPORT_JSON_PATH, "w", encoding="utf-8") as f:
213
+ json.dump(report_payload, f, indent=2)
214
+ except Exception as exc:
215
+ print(f"[DEBUG] Failed to write REPORT_JSON_PATH: {exc}", flush=True)
216
+
217
+ if REPORT_CSV_PATH:
218
+ try:
219
+ with open(REPORT_CSV_PATH, "w", encoding="utf-8", newline="") as f:
220
+ writer = csv.DictWriter(
221
+ f,
222
+ fieldnames=[
223
+ "task",
224
+ "seed",
225
+ "score",
226
+ "steps",
227
+ "success",
228
+ "trace_digest",
229
+ "invalid_actions",
230
+ "harmful_scale_down",
231
+ ],
232
+ )
233
+ writer.writeheader()
234
+ for row in seed_rows:
235
+ writer.writerow(row)
236
+ except Exception as exc:
237
+ print(f"[DEBUG] Failed to write REPORT_CSV_PATH: {exc}", flush=True)
238
+
239
+
240
+ def build_obs_summary(obs: CloudQueueObservation, task_name: str) -> str:
241
+ """Build a rich, structured text summary of the observation for the LLM prompt."""
242
+ # Queue fill percentages — helps model know when to reject
243
+ max_sizes = {"easy": 28, "medium": 42, "hard": 64}
244
+ max_q = max_sizes.get(task_name, 30)
245
+ fills = [f"{l}/{max_q}({100*l//max_q}%)" for l in obs.queue_lengths]
246
+
247
+ # Server status
248
+ busy_count = sum(obs.server_busy)
249
+ total_servers = len(obs.server_busy)
250
+ servers_str = f"{busy_count}/{total_servers} busy"
251
+
252
+ # Incoming job info
253
+ if obs.incoming_job_present:
254
+ urgency = "URGENT" if obs.incoming_job_priority >= 2 else "normal"
255
+ incoming_str = f"YES [{urgency} size={obs.incoming_job_size:.1f} deadline={obs.incoming_job_deadline:.0f}]"
256
+ else:
257
+ incoming_str = "none"
258
+
259
+ return (
260
+ f"task={task_name} | "
261
+ f"queues={fills} | "
262
+ f"servers={servers_str} | "
263
+ f"incoming={incoming_str} | "
264
+ f"sla_breach={obs.sla_violation_rate:.3f} | "
265
+ f"abandonment={obs.abandonment_rate:.3f} | "
266
+ f"cost_rate={obs.energy_cost_rate:.3f}"
267
+ )
268
+
269
+
270
+ def build_user_prompt(step: int, obs_summary: str, last_reward: float, history: List[str], task_name: str) -> str:
271
+ history_block = "\n".join(history[-4:]) if history else "None"
272
+ return textwrap.dedent(
273
+ f"""
274
+ Step {step} | Last reward: {last_reward:.2f}
275
+ State: {obs_summary}
276
+ Recent actions:
277
+ {history_block}
278
+ Choose the best action now.
279
+ """
280
+ ).strip()
281
+
282
+
283
+ def choose_heuristic_action(task_name: str, queue_lengths: List[int], incoming_present: bool) -> CloudQueueAction:
284
+ if incoming_present:
285
+ if task_name == "hard" and len(queue_lengths) > 1 and queue_lengths[0] > queue_lengths[1]:
286
+ return CloudQueueAction(action_type="route", target_queue=1)
287
+ if task_name == "medium" and len(queue_lengths) > 1 and queue_lengths[1] < queue_lengths[0]:
288
+ return CloudQueueAction(action_type="route", target_queue=1)
289
+ return CloudQueueAction(action_type="admit", target_queue=0)
290
+ return CloudQueueAction(action_type="dispatch", target_queue=0)
291
+
292
+
293
+ def _coerce_optional_int(value: Any) -> Optional[int]:
294
+ if value is None:
295
+ return None
296
+ if isinstance(value, bool):
297
+ return int(value)
298
+ if isinstance(value, int):
299
+ return value
300
+ if isinstance(value, float):
301
+ return int(value)
302
+ if isinstance(value, str):
303
+ txt = value.strip().lower()
304
+ if txt in {"", "null", "none"}:
305
+ return None
306
+ try:
307
+ return int(txt)
308
+ except ValueError:
309
+ try:
310
+ return int(float(txt))
311
+ except ValueError:
312
+ return None
313
+ return None
314
+
315
+
316
+ def _extract_json_object(text: str) -> Optional[dict[str, Any]]:
317
+ cleaned = (text or "").strip()
318
+ if not cleaned:
319
+ return None
320
+
321
+ # Handle common fenced responses first.
322
+ if cleaned.startswith("```"):
323
+ chunks = [chunk.strip() for chunk in cleaned.split("```") if chunk.strip()]
324
+ for chunk in chunks:
325
+ candidate = chunk
326
+ if candidate.lower().startswith("json"):
327
+ candidate = candidate[4:].strip()
328
+ try:
329
+ parsed = json.loads(candidate)
330
+ if isinstance(parsed, dict):
331
+ return parsed
332
+ if isinstance(parsed, list) and parsed and isinstance(parsed[0], dict):
333
+ return parsed[0]
334
+ except Exception:
335
+ continue
336
+
337
+ try:
338
+ parsed = json.loads(cleaned)
339
+ if isinstance(parsed, dict):
340
+ return parsed
341
+ if isinstance(parsed, list) and parsed and isinstance(parsed[0], dict):
342
+ return parsed[0]
343
+ except Exception:
344
+ pass
345
+
346
+ # Fallback: extract the first balanced JSON object from noisy text.
347
+ start = 0
348
+ while True:
349
+ open_idx = cleaned.find("{", start)
350
+ if open_idx < 0:
351
+ return None
352
+ depth = 0
353
+ for i in range(open_idx, len(cleaned)):
354
+ ch = cleaned[i]
355
+ if ch == "{":
356
+ depth += 1
357
+ elif ch == "}":
358
+ depth -= 1
359
+ if depth == 0:
360
+ candidate = cleaned[open_idx : i + 1]
361
+ try:
362
+ parsed = json.loads(candidate)
363
+ if isinstance(parsed, dict):
364
+ return parsed
365
+ except Exception:
366
+ break
367
+ start = open_idx + 1
368
+
369
+
370
+ def _normalize_action_payload(data: dict[str, Any], task_name: str) -> Optional[dict[str, Any]]:
371
+ action_type = str(data.get("action_type", "noop")).strip().lower()
372
+ if action_type not in ACTION_TYPES:
373
+ return None
374
+
375
+ if action_type not in TASK_ALLOWED_ACTIONS.get(task_name, set(ACTION_TYPES)):
376
+ return None
377
+
378
+ target_queue = _coerce_optional_int(data.get("target_queue"))
379
+ target_server = _coerce_optional_int(data.get("target_server"))
380
+ scale_delta = _coerce_optional_int(data.get("scale_delta"))
381
+ new_priority = _coerce_optional_int(data.get("new_priority"))
382
+
383
+ if action_type in {"admit", "route", "dispatch"} and target_queue is None:
384
+ target_queue = 0
385
+ if action_type in {"reject", "noop"}:
386
+ target_queue = None
387
+ target_server = None
388
+
389
+ if action_type == "scale":
390
+ if scale_delta is None:
391
+ return None
392
+ scale_delta = max(-2, min(2, scale_delta))
393
+ else:
394
+ scale_delta = None
395
+
396
+ if action_type == "reprioritize":
397
+ if new_priority is None:
398
+ new_priority = 2
399
+ else:
400
+ new_priority = None
401
+
402
+ return {
403
+ "action_type": action_type,
404
+ "target_queue": target_queue,
405
+ "target_server": target_server,
406
+ "scale_delta": scale_delta,
407
+ "new_priority": new_priority,
408
+ }
409
+
410
+
411
+ def parse_model_action(text: str, task_name: str) -> Optional[CloudQueueAction]:
412
+ data = _extract_json_object(text)
413
+ if data is None:
414
+ return None
415
+
416
+ payload = _normalize_action_payload(data, task_name)
417
+ if payload is None:
418
+ return None
419
+
420
+ try:
421
+ return CloudQueueAction(**payload)
422
+ except Exception:
423
+ return None
424
+
425
+
426
+ def get_model_action(
427
+ client: OpenAI,
428
+ task_name: str,
429
+ step: int,
430
+ obs_summary: str,
431
+ last_reward: float,
432
+ history: List[str],
433
+ ) -> tuple[Optional[CloudQueueAction], Optional[str]]:
434
+ global _SCHEMA_RESPONSE_FORMAT_FAILED
435
+
436
+ user_prompt = build_user_prompt(step, obs_summary, last_reward, history, task_name)
437
+ messages = [
438
+ {"role": "system", "content": SYSTEM_PROMPT},
439
+ {"role": "user", "content": user_prompt},
440
+ ]
441
+
442
+ try:
443
+ if not _SCHEMA_RESPONSE_FORMAT_FAILED:
444
+ try:
445
+ completion = client.chat.completions.create(
446
+ model=MODEL_NAME,
447
+ messages=messages,
448
+ temperature=TEMPERATURE,
449
+ max_tokens=MAX_TOKENS,
450
+ stream=False,
451
+ response_format=MODEL_ACTION_RESPONSE_FORMAT,
452
+ )
453
+ except Exception as schema_exc:
454
+ _SCHEMA_RESPONSE_FORMAT_FAILED = True
455
+ print(
456
+ f"[DEBUG] response_format unavailable, retrying without schema: {schema_exc}",
457
+ flush=True,
458
+ )
459
+ completion = client.chat.completions.create(
460
+ model=MODEL_NAME,
461
+ messages=messages,
462
+ temperature=TEMPERATURE,
463
+ max_tokens=MAX_TOKENS,
464
+ stream=False,
465
+ )
466
+ else:
467
+ completion = client.chat.completions.create(
468
+ model=MODEL_NAME,
469
+ messages=messages,
470
+ temperature=TEMPERATURE,
471
+ max_tokens=MAX_TOKENS,
472
+ stream=False,
473
+ )
474
+
475
+ text = (completion.choices[0].message.content or "").strip()
476
+ action = parse_model_action(text, task_name)
477
+ if action is None:
478
+ preview = " ".join(text.split())[:180]
479
+ return None, f"invalid_model_action_payload: {preview}"
480
+ return action, None
481
+ except Exception as exc:
482
+ print(f"[DEBUG] Model request failed: {exc}", flush=True)
483
+ return None, str(exc)
484
+
485
+
486
+ def normalize_base_url(base_url: Optional[str]) -> Optional[str]:
487
+ """Normalize user-provided BASE_URL into an API runtime URL.
488
+
489
+ If a Hugging Face repo page URL is provided (huggingface.co/spaces/user/space),
490
+ convert it to the runtime domain (https://user-space.hf.space).
491
+ """
492
+ if not base_url:
493
+ return base_url
494
+
495
+ cleaned = base_url.strip().rstrip("/")
496
+ parsed = urlparse(cleaned)
497
+
498
+ # Handle Hugging Face repo page URL -> runtime URL used by API/WebSocket.
499
+ if parsed.netloc.lower() == "huggingface.co":
500
+ parts = [p for p in parsed.path.strip("/").split("/") if p]
501
+ if len(parts) >= 3 and parts[0] == "spaces":
502
+ owner, space = parts[1], parts[2]
503
+ # HF runtime hostnames use lowercase and are TLS-safe.
504
+ owner = owner.lower().replace("_", "-")
505
+ space = space.lower().replace("_", "-")
506
+ return f"https://{owner}-{space}.hf.space"
507
+
508
+ # Avoid accidentally pointing at the web UI path.
509
+ if cleaned.endswith("/web"):
510
+ cleaned = cleaned[:-4]
511
+ parsed = urlparse(cleaned)
512
+
513
+ # HF runtime domains should be lowercase and avoid underscores for TLS host checks.
514
+ host = (parsed.hostname or "").lower()
515
+ if host.endswith(".hf.space"):
516
+ safe_host = host.replace("_", "-")
517
+ if safe_host != host or (parsed.netloc and parsed.netloc != parsed.netloc.lower()):
518
+ port_part = f":{parsed.port}" if parsed.port else ""
519
+ netloc = f"{safe_host}{port_part}"
520
+ parsed = parsed._replace(netloc=netloc)
521
+ cleaned = urlunparse(parsed)
522
+
523
+ return cleaned
524
+
525
+
526
+ def _smoke_test_model(client: OpenAI) -> bool:
527
+ """Verify the model API is reachable AND can generate a coherent response.
528
+
529
+ Asks a short queue-domain question that requires a real sentence answer.
530
+ An empty or missing reply is treated as failure — not just exceptions.
531
+
532
+ Prints [MODEL_OK] or [MODEL_FAIL] with details.
533
+ Returns True if the model is working, False otherwise.
534
+ """
535
+ print(f"[MODEL_CHECK] Testing model={MODEL_NAME} at {API_BASE_URL} ...", flush=True)
536
+ test_question = (
537
+ "You are a cloud scheduling agent. "
538
+ "A job queue is 80% full and a new urgent job just arrived. "
539
+ "Should you admit the job, reject it, or route it to another queue? "
540
+ "Answer in one sentence and explain why."
541
+ )
542
+ try:
543
+ resp = client.chat.completions.create(
544
+ model=MODEL_NAME,
545
+ messages=[{"role": "user", "content": test_question}],
546
+ temperature=0.0,
547
+ max_tokens=80,
548
+ )
549
+ reply = (resp.choices[0].message.content or "").strip()
550
+ if not reply:
551
+ print("[MODEL_FAIL] Model returned an empty response.", flush=True)
552
+ print("[MODEL_FAIL] Will fall back to heuristic for all steps.", flush=True)
553
+ return False
554
+ print(f"[MODEL_OK] model is reasoning correctly.", flush=True)
555
+ print(f"[MODEL_OK] test reply: {reply}", flush=True)
556
+ return True
557
+ except Exception as exc:
558
+ print(f"[MODEL_FAIL] Cannot reach model: {exc}", flush=True)
559
+ print("[MODEL_FAIL] Will fall back to heuristic for all steps.", flush=True)
560
+ return False
561
+
562
+
563
+ async def main() -> None:
564
+ if not API_KEY and not USE_HEURISTIC_ONLY:
565
+ raise ValueError("API_KEY is required for model inference.")
566
+
567
+ client = None
568
+ if not USE_HEURISTIC_ONLY:
569
+ client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
570
+ runtime_base_url = normalize_base_url(BASE_URL)
571
+
572
+ if runtime_base_url:
573
+ env = CloudQueueEnv(base_url=runtime_base_url)
574
+ else:
575
+ if not IMAGE_NAME:
576
+ raise ValueError(
577
+ "Set BASE_URL for deployed env, or IMAGE_NAME for local docker env."
578
+ )
579
+ env = await CloudQueueEnv.from_docker_image(IMAGE_NAME)
580
+
581
+ try:
582
+ # Run smoke test before benchmark — confirms model API is reachable.
583
+ model_enabled = client is not None
584
+ if client is not None:
585
+ model_enabled = _smoke_test_model(client)
586
+ task_seed_map = parse_task_seed_map()
587
+ replay_map = load_replay_actions()
588
+ task_score_table: dict[str, list[float]] = {}
589
+ seed_rows: list[dict] = []
590
+
591
+ for task_name in TASKS:
592
+ seeds = task_seed_map.get(task_name, [])
593
+ if not seeds:
594
+ continue
595
+
596
+ task_score_table[task_name] = []
597
+
598
+ for seed in seeds:
599
+ history: List[str] = []
600
+ rewards: List[float] = []
601
+ steps_taken = 0
602
+ score = 0.0
603
+ success = False
604
+
605
+ log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
606
+
607
+ await env.reset()
608
+ await env.step(
609
+ CloudQueueAction(action_type="configure_task", task_id=task_name, seed=seed)
610
+ )
611
+ result = await env.reset()
612
+ last_reward = 0.0
613
+ max_steps = max(1, int(result.observation.horizon))
614
+ if MAX_STEPS_OVERRIDE > 0:
615
+ max_steps = min(max_steps, MAX_STEPS_OVERRIDE)
616
+
617
+ for step in range(1, max_steps + 1):
618
+ if result.done:
619
+ break
620
+
621
+ obs = result.observation
622
+ obs_summary = build_obs_summary(obs, task_name)
623
+
624
+ action = None
625
+ model_error = None
626
+ replay_key = f"{task_name}:{seed}"
627
+ replay_actions = replay_map.get(replay_key, [])
628
+ if step - 1 < len(replay_actions):
629
+ action = replay_actions[step - 1]
630
+
631
+ if action is None and model_enabled and client is not None:
632
+ action, model_error = get_model_action(
633
+ client=client,
634
+ task_name=task_name,
635
+ step=step,
636
+ obs_summary=obs_summary,
637
+ last_reward=last_reward,
638
+ history=history,
639
+ )
640
+ if model_error and DISABLE_MODEL_ON_FIRST_ERROR:
641
+ model_enabled = False
642
+ print("[DEBUG] Disabling model calls and switching to heuristic fallback.", flush=True)
643
+
644
+ if action is None:
645
+ action = choose_heuristic_action(
646
+ task_name=task_name,
647
+ queue_lengths=obs.queue_lengths,
648
+ incoming_present=obs.incoming_job_present,
649
+ )
650
+
651
+ result = await env.step(action)
652
+ reward = float(result.reward or 0.0)
653
+ done = bool(result.done)
654
+ error = None
655
+ meta = result.observation.metadata or {}
656
+ info = meta.get("info", {}) if isinstance(meta, dict) else {}
657
+ if isinstance(info, dict) and info.get("valid_action") is False:
658
+ error = str(info.get("note", "invalid_action"))
659
+
660
+ rewards.append(reward)
661
+ steps_taken = step
662
+ last_reward = reward
663
+
664
+ action_str = (
665
+ f"{action.action_type}(q={action.target_queue},s={action.target_server},"
666
+ f"d={action.scale_delta},p={action.new_priority})"
667
+ )
668
+ log_step(step=step, action=action_str, reward=reward, done=done, error=error)
669
+
670
+ history.append(f"step={step} action={action_str} reward={reward:.2f}")
671
+
672
+ if done:
673
+ break
674
+
675
+ if isinstance(result.observation.metadata, dict):
676
+ score = float(result.observation.metadata.get("episode_score", 0.0) or 0.0)
677
+ # Debug: print raw server metadata so we can verify grader output
678
+ _m = result.observation.metadata
679
+ print(
680
+ f"[DEBUG_META] task={task_name} seed={seed} "
681
+ f"episode_score={_m.get('episode_score')} "
682
+ f"score_details={_m.get('score_details')} "
683
+ f"metrics_completed={_m.get('metrics', {}).get('completed')} "
684
+ f"metrics_arrivals={_m.get('metrics', {}).get('arrivals')}",
685
+ flush=True,
686
+ )
687
+ score = max(0.0, min(1.0, score))
688
+ task_score_table[task_name].append(score)
689
+ success = score >= SUCCESS_SCORE_THRESHOLD
690
+ log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
691
+
692
+ meta = result.observation.metadata or {}
693
+ metrics = meta.get("metrics", {}) if isinstance(meta, dict) else {}
694
+ seed_row = {
695
+ "task": task_name,
696
+ "seed": int(seed),
697
+ "score": round(score, 6),
698
+ "steps": int(steps_taken),
699
+ "success": bool(success),
700
+ "trace_digest": str(meta.get("trace_digest", "")),
701
+ "invalid_actions": float(metrics.get("invalid_actions", 0.0)),
702
+ "harmful_scale_down": float(metrics.get("harmful_scale_down", 0.0)),
703
+ }
704
+ seed_rows.append(seed_row)
705
+ print(
706
+ "[REPORT_SEED] "
707
+ f"task={seed_row['task']} seed={seed_row['seed']} score={seed_row['score']:.3f} "
708
+ f"steps={seed_row['steps']} trace={seed_row['trace_digest']}",
709
+ flush=True,
710
+ )
711
+
712
+ task_scores = task_score_table[task_name]
713
+ task_mean = statistics.mean(task_scores) if task_scores else 0.0
714
+ task_std = statistics.pstdev(task_scores) if len(task_scores) > 1 else 0.0
715
+ task_ci = ci95(task_scores)
716
+ print(
717
+ f"[REPORT] task={task_name} seeds={len(task_scores)} mean={task_mean:.3f} std={task_std:.3f} ci95={task_ci:.3f}",
718
+ flush=True,
719
+ )
720
+
721
+ all_task_means = []
722
+ for task_name in TASKS:
723
+ scores = task_score_table.get(task_name, [])
724
+ if scores:
725
+ all_task_means.append(statistics.mean(scores))
726
+
727
+ if all_task_means:
728
+ final_score = sum(all_task_means) / len(all_task_means)
729
+ easy_mean = statistics.mean(task_score_table.get("easy", [0.0]))
730
+ medium_mean = statistics.mean(task_score_table.get("medium", [0.0]))
731
+ hard_mean = statistics.mean(task_score_table.get("hard", [0.0]))
732
+ print(
733
+ f"[SUMMARY] easy={easy_mean:.3f} medium={medium_mean:.3f} hard={hard_mean:.3f} final={final_score:.3f}",
734
+ flush=True,
735
+ )
736
+
737
+ write_reports(seed_rows=seed_rows, task_score_table=task_score_table)
738
+
739
+ finally:
740
+ try:
741
+ await env.close()
742
+ except Exception as e:
743
+ print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
744
+
745
+
746
+ if __name__ == "__main__":
747
+ asyncio.run(main())
inference2.py ADDED
@@ -0,0 +1,751 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Strict model-only inference runner for the queue operations benchmark.
2
+
3
+ This variant intentionally removes heuristic fallback paths.
4
+ Every decision must come from either:
5
+ 1) replay trace input (ACTION_TRACE_FILE), or
6
+ 2) model output.
7
+
8
+ If model output is invalid/unavailable, the seed run is marked failed.
9
+ """
10
+
11
+ import asyncio
12
+ import csv
13
+ import json
14
+ import os
15
+ import statistics
16
+ import textwrap
17
+ from typing import Any, List, Optional
18
+ from urllib.parse import urlparse, urlunparse
19
+
20
+ from dotenv import load_dotenv
21
+ from openai import OpenAI
22
+
23
+ load_dotenv()
24
+
25
+ from cloud_queue_env import CloudQueueAction, CloudQueueEnv, CloudQueueObservation
26
+
27
+
28
+ IMAGE_NAME = os.getenv("IMAGE_NAME")
29
+ BASE_URL = os.getenv("BASE_URL")
30
+
31
+ API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
32
+ MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
33
+ API_KEY = os.getenv("API_KEY") or os.getenv("HF_TOKEN")
34
+
35
+ BENCHMARK = os.getenv("BENCHMARK", "queueops-openenv")
36
+ TASKS = ["easy", "medium", "hard"]
37
+ TASK_SEEDS_JSON = os.getenv("TASK_SEEDS_JSON")
38
+ SEEDS = [11, 23, 37]
39
+ TEMPERATURE = 0.2
40
+ MAX_TOKENS = 780
41
+ SUCCESS_SCORE_THRESHOLD = 0.60
42
+ MAX_STEPS_OVERRIDE = int(os.getenv("MAX_STEPS_OVERRIDE", "0") or "0")
43
+ ACTION_TRACE_FILE = os.getenv("ACTION_TRACE_FILE")
44
+ REPORT_JSON_PATH = os.getenv("REPORT_JSON_PATH")
45
+ REPORT_CSV_PATH = os.getenv("REPORT_CSV_PATH")
46
+
47
+ SYSTEM_PROMPT = textwrap.dedent(
48
+ """
49
+ You are an agent controlling a cloud queue scheduling environment.
50
+ Your goal: minimize wait times, SLA violations, and cost while maximizing throughput.
51
+
52
+ Return exactly one JSON object and no extra text.
53
+
54
+ ACTIONS:
55
+ {"action_type": "admit", "target_queue": 0}
56
+ {"action_type": "route", "target_queue": 1}
57
+ {"action_type": "reject", "target_queue": null}
58
+ {"action_type": "dispatch", "target_queue": 0}
59
+ {"action_type": "reprioritize","new_priority": 2}
60
+ {"action_type": "scale", "scale_delta": 1}
61
+ {"action_type": "noop", "target_queue": null}
62
+
63
+ Constraints:
64
+ - easy: use admit/reject/dispatch/noop only
65
+ - medium: use admit/reject/route/dispatch/reprioritize/noop only
66
+ - hard: use admit/reject/route/dispatch/reprioritize/scale/noop only
67
+
68
+ No explanation. JSON only.
69
+ """
70
+ ).strip()
71
+
72
+ ACTION_TYPES = (
73
+ "configure_task",
74
+ "admit",
75
+ "reject",
76
+ "route",
77
+ "dispatch",
78
+ "scale",
79
+ "reprioritize",
80
+ "noop",
81
+ )
82
+
83
+ TASK_ALLOWED_ACTIONS = {
84
+ "easy": {"admit", "reject", "dispatch", "noop"},
85
+ "medium": {"admit", "reject", "route", "dispatch", "reprioritize", "noop"},
86
+ "hard": {"admit", "reject", "route", "dispatch", "reprioritize", "scale", "noop"},
87
+ }
88
+
89
+ ACTION_PAYLOAD_PROPERTIES = {
90
+ "target_queue": {"type": ["integer", "null"], "minimum": 0},
91
+ "target_server": {"type": ["integer", "null"], "minimum": 0},
92
+ "scale_delta": {"type": ["integer", "null"], "minimum": -2, "maximum": 2},
93
+ "new_priority": {"type": ["integer", "null"], "minimum": 0, "maximum": 3},
94
+ }
95
+
96
+ _SCHEMA_RESPONSE_FORMAT_FAILED = False
97
+
98
+
99
+ def log_start(task: str, env: str, model: str) -> None:
100
+ print(f"[START] task={task} env={env} model={model}", flush=True)
101
+
102
+
103
+ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
104
+ error_val = error if error else "null"
105
+ done_val = str(done).lower()
106
+ print(
107
+ f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
108
+ flush=True,
109
+ )
110
+
111
+
112
+ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
113
+ rewards_str = ",".join(f"{r:.2f}" for r in rewards)
114
+ print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
115
+
116
+
117
+ def model_action_response_format(task_name: str) -> dict[str, Any]:
118
+ allowed = sorted(TASK_ALLOWED_ACTIONS.get(task_name, set(ACTION_TYPES)))
119
+ return {
120
+ "type": "json_schema",
121
+ "json_schema": {
122
+ "name": f"cloud_queue_action_{task_name}",
123
+ "strict": True,
124
+ "schema": {
125
+ "type": "object",
126
+ "additionalProperties": False,
127
+ "required": [
128
+ "action_type",
129
+ "target_queue",
130
+ "target_server",
131
+ "scale_delta",
132
+ "new_priority",
133
+ ],
134
+ "properties": {
135
+ "action_type": {"type": "string", "enum": allowed},
136
+ **ACTION_PAYLOAD_PROPERTIES,
137
+ },
138
+ },
139
+ },
140
+ }
141
+
142
+
143
+ def parse_task_seed_map() -> dict[str, list[int]]:
144
+ if TASK_SEEDS_JSON:
145
+ try:
146
+ data = json.loads(TASK_SEEDS_JSON)
147
+ task_map: dict[str, list[int]] = {}
148
+ for task_name, seeds in data.items():
149
+ parsed = [int(s) for s in seeds]
150
+ if parsed:
151
+ task_map[str(task_name)] = parsed
152
+ if task_map:
153
+ return task_map
154
+ except Exception as exc:
155
+ print(f"[DEBUG] Invalid TASK_SEEDS_JSON, falling back to defaults: {exc}", flush=True)
156
+
157
+ return {
158
+ "easy": [SEEDS[0]],
159
+ "medium": [SEEDS[1]],
160
+ "hard": [SEEDS[2]],
161
+ }
162
+
163
+
164
+ def _action_from_dict(data: dict) -> CloudQueueAction:
165
+ return CloudQueueAction(
166
+ action_type=str(data.get("action_type", "noop")),
167
+ target_queue=data.get("target_queue"),
168
+ target_server=data.get("target_server"),
169
+ scale_delta=data.get("scale_delta"),
170
+ new_priority=data.get("new_priority"),
171
+ )
172
+
173
+
174
+ def load_replay_actions() -> dict[str, list[CloudQueueAction]]:
175
+ if not ACTION_TRACE_FILE:
176
+ return {}
177
+
178
+ try:
179
+ with open(ACTION_TRACE_FILE, "r", encoding="utf-8") as f:
180
+ payload = json.load(f)
181
+ except Exception as exc:
182
+ print(f"[DEBUG] Failed to load ACTION_TRACE_FILE: {exc}", flush=True)
183
+ return {}
184
+
185
+ replay: dict[str, list[CloudQueueAction]] = {}
186
+ if isinstance(payload, dict):
187
+ for key, action_list in payload.items():
188
+ if not isinstance(action_list, list):
189
+ continue
190
+ parsed = []
191
+ for item in action_list:
192
+ if isinstance(item, dict):
193
+ parsed.append(_action_from_dict(item))
194
+ if parsed:
195
+ replay[str(key)] = parsed
196
+ return replay
197
+
198
+
199
+ def ci95(values: list[float]) -> float:
200
+ if len(values) <= 1:
201
+ return 0.0
202
+ std = statistics.pstdev(values)
203
+ return 1.96 * std / (len(values) ** 0.5)
204
+
205
+
206
+ def write_reports(seed_rows: list[dict], task_score_table: dict[str, list[float]]) -> None:
207
+ if REPORT_JSON_PATH:
208
+ report_payload = {
209
+ "seed_rows": seed_rows,
210
+ "task_summary": {
211
+ task: {
212
+ "mean": statistics.mean(scores) if scores else 0.0,
213
+ "std": statistics.pstdev(scores) if len(scores) > 1 else 0.0,
214
+ "ci95": ci95(scores),
215
+ "count": len(scores),
216
+ }
217
+ for task, scores in task_score_table.items()
218
+ },
219
+ }
220
+ try:
221
+ with open(REPORT_JSON_PATH, "w", encoding="utf-8") as f:
222
+ json.dump(report_payload, f, indent=2)
223
+ except Exception as exc:
224
+ print(f"[DEBUG] Failed to write REPORT_JSON_PATH: {exc}", flush=True)
225
+
226
+ if REPORT_CSV_PATH:
227
+ try:
228
+ with open(REPORT_CSV_PATH, "w", encoding="utf-8", newline="") as f:
229
+ writer = csv.DictWriter(
230
+ f,
231
+ fieldnames=[
232
+ "task",
233
+ "seed",
234
+ "score",
235
+ "steps",
236
+ "success",
237
+ "trace_digest",
238
+ "invalid_actions",
239
+ "harmful_scale_down",
240
+ "failure_reason",
241
+ ],
242
+ )
243
+ writer.writeheader()
244
+ for row in seed_rows:
245
+ writer.writerow(row)
246
+ except Exception as exc:
247
+ print(f"[DEBUG] Failed to write REPORT_CSV_PATH: {exc}", flush=True)
248
+
249
+
250
+ def build_obs_summary(obs: CloudQueueObservation, task_name: str) -> str:
251
+ max_sizes = {"easy": 28, "medium": 42, "hard": 64}
252
+ max_q = max_sizes.get(task_name, 30)
253
+ fills = [f"{l}/{max_q}({100*l//max_q}%)" for l in obs.queue_lengths]
254
+
255
+ busy_count = sum(obs.server_busy)
256
+ total_servers = len(obs.server_busy)
257
+ servers_str = f"{busy_count}/{total_servers} busy"
258
+
259
+ if obs.incoming_job_present:
260
+ urgency = "URGENT" if obs.incoming_job_priority >= 2 else "normal"
261
+ incoming_str = f"YES [{urgency} size={obs.incoming_job_size:.1f} deadline={obs.incoming_job_deadline:.0f}]"
262
+ else:
263
+ incoming_str = "none"
264
+
265
+ return (
266
+ f"task={task_name} | "
267
+ f"queues={fills} | "
268
+ f"servers={servers_str} | "
269
+ f"incoming={incoming_str} | "
270
+ f"sla_breach={obs.sla_violation_rate:.3f} | "
271
+ f"abandonment={obs.abandonment_rate:.3f} | "
272
+ f"cost_rate={obs.energy_cost_rate:.3f}"
273
+ )
274
+
275
+
276
+ def build_user_prompt(step: int, obs_summary: str, last_reward: float, history: List[str]) -> str:
277
+ history_block = "\n".join(history[-4:]) if history else "None"
278
+ return textwrap.dedent(
279
+ f"""
280
+ Step {step} | Last reward: {last_reward:.2f}
281
+ State: {obs_summary}
282
+ Recent actions:
283
+ {history_block}
284
+ Choose the best action now.
285
+ """
286
+ ).strip()
287
+
288
+
289
+ def _coerce_optional_int(value: Any) -> Optional[int]:
290
+ if value is None:
291
+ return None
292
+ if isinstance(value, bool):
293
+ return int(value)
294
+ if isinstance(value, int):
295
+ return value
296
+ if isinstance(value, float):
297
+ return int(value)
298
+ if isinstance(value, str):
299
+ txt = value.strip().lower()
300
+ if txt in {"", "null", "none"}:
301
+ return None
302
+ try:
303
+ return int(txt)
304
+ except ValueError:
305
+ try:
306
+ return int(float(txt))
307
+ except ValueError:
308
+ return None
309
+ return None
310
+
311
+
312
+ def _extract_json_object(text: str) -> Optional[dict[str, Any]]:
313
+ cleaned = (text or "").strip()
314
+ if not cleaned:
315
+ return None
316
+
317
+ if cleaned.startswith("```"):
318
+ chunks = [chunk.strip() for chunk in cleaned.split("```") if chunk.strip()]
319
+ for chunk in chunks:
320
+ candidate = chunk
321
+ if candidate.lower().startswith("json"):
322
+ candidate = candidate[4:].strip()
323
+ try:
324
+ parsed = json.loads(candidate)
325
+ if isinstance(parsed, dict):
326
+ return parsed
327
+ if isinstance(parsed, list) and parsed and isinstance(parsed[0], dict):
328
+ return parsed[0]
329
+ except Exception:
330
+ continue
331
+
332
+ try:
333
+ parsed = json.loads(cleaned)
334
+ if isinstance(parsed, dict):
335
+ return parsed
336
+ if isinstance(parsed, list) and parsed and isinstance(parsed[0], dict):
337
+ return parsed[0]
338
+ except Exception:
339
+ pass
340
+
341
+ start = 0
342
+ while True:
343
+ open_idx = cleaned.find("{", start)
344
+ if open_idx < 0:
345
+ return None
346
+ depth = 0
347
+ for i in range(open_idx, len(cleaned)):
348
+ ch = cleaned[i]
349
+ if ch == "{":
350
+ depth += 1
351
+ elif ch == "}":
352
+ depth -= 1
353
+ if depth == 0:
354
+ candidate = cleaned[open_idx : i + 1]
355
+ try:
356
+ parsed = json.loads(candidate)
357
+ if isinstance(parsed, dict):
358
+ return parsed
359
+ except Exception:
360
+ break
361
+ start = open_idx + 1
362
+
363
+
364
+ def _normalize_action_payload(data: dict[str, Any], task_name: str) -> Optional[dict[str, Any]]:
365
+ action_type = str(data.get("action_type", "noop")).strip().lower()
366
+ if action_type not in ACTION_TYPES:
367
+ return None
368
+ if action_type not in TASK_ALLOWED_ACTIONS.get(task_name, set(ACTION_TYPES)):
369
+ return None
370
+
371
+ target_queue = _coerce_optional_int(data.get("target_queue"))
372
+ target_server = _coerce_optional_int(data.get("target_server"))
373
+ scale_delta = _coerce_optional_int(data.get("scale_delta"))
374
+ new_priority = _coerce_optional_int(data.get("new_priority"))
375
+
376
+ if action_type in {"admit", "route", "dispatch"} and target_queue is None:
377
+ target_queue = 0
378
+ if action_type in {"reject", "noop"}:
379
+ target_queue = None
380
+ target_server = None
381
+
382
+ if action_type == "scale":
383
+ if scale_delta is None:
384
+ return None
385
+ scale_delta = max(-2, min(2, scale_delta))
386
+ else:
387
+ scale_delta = None
388
+
389
+ if action_type == "reprioritize":
390
+ if new_priority is None:
391
+ new_priority = 2
392
+ else:
393
+ new_priority = None
394
+
395
+ return {
396
+ "action_type": action_type,
397
+ "target_queue": target_queue,
398
+ "target_server": target_server,
399
+ "scale_delta": scale_delta,
400
+ "new_priority": new_priority,
401
+ }
402
+
403
+
404
+ def parse_model_action(text: str, task_name: str) -> Optional[CloudQueueAction]:
405
+ data = _extract_json_object(text)
406
+ if data is None:
407
+ return None
408
+ payload = _normalize_action_payload(data, task_name)
409
+ if payload is None:
410
+ return None
411
+ try:
412
+ return CloudQueueAction(**payload)
413
+ except Exception:
414
+ return None
415
+
416
+
417
+ def get_model_action(
418
+ client: OpenAI,
419
+ task_name: str,
420
+ step: int,
421
+ obs_summary: str,
422
+ last_reward: float,
423
+ history: List[str],
424
+ ) -> tuple[Optional[CloudQueueAction], Optional[str]]:
425
+ global _SCHEMA_RESPONSE_FORMAT_FAILED
426
+
427
+ user_prompt = build_user_prompt(step, obs_summary, last_reward, history)
428
+ messages = [
429
+ {"role": "system", "content": SYSTEM_PROMPT},
430
+ {"role": "user", "content": user_prompt},
431
+ ]
432
+
433
+ try:
434
+ if not _SCHEMA_RESPONSE_FORMAT_FAILED:
435
+ try:
436
+ completion = client.chat.completions.create(
437
+ model=MODEL_NAME,
438
+ messages=messages,
439
+ temperature=TEMPERATURE,
440
+ max_tokens=MAX_TOKENS,
441
+ stream=False,
442
+ response_format=model_action_response_format(task_name),
443
+ )
444
+ except Exception as schema_exc:
445
+ _SCHEMA_RESPONSE_FORMAT_FAILED = True
446
+ print(
447
+ f"[DEBUG] response_format unavailable, retrying without schema: {schema_exc}",
448
+ flush=True,
449
+ )
450
+ completion = client.chat.completions.create(
451
+ model=MODEL_NAME,
452
+ messages=messages,
453
+ temperature=TEMPERATURE,
454
+ max_tokens=MAX_TOKENS,
455
+ stream=False,
456
+ )
457
+ else:
458
+ completion = client.chat.completions.create(
459
+ model=MODEL_NAME,
460
+ messages=messages,
461
+ temperature=TEMPERATURE,
462
+ max_tokens=MAX_TOKENS,
463
+ stream=False,
464
+ )
465
+
466
+ text = (completion.choices[0].message.content or "").strip()
467
+ action = parse_model_action(text, task_name)
468
+ if action is None:
469
+ preview = " ".join(text.split())[:180]
470
+ return None, f"invalid_model_action_payload: {preview}"
471
+ return action, None
472
+ except Exception as exc:
473
+ return None, str(exc)
474
+
475
+
476
+ def get_model_action_with_retry(
477
+ client: OpenAI,
478
+ task_name: str,
479
+ step: int,
480
+ obs_summary: str,
481
+ last_reward: float,
482
+ history: List[str],
483
+ retries: int = 2,
484
+ ) -> tuple[Optional[CloudQueueAction], Optional[str]]:
485
+ last_error: Optional[str] = None
486
+ for attempt in range(1, retries + 2):
487
+ action, error = get_model_action(
488
+ client=client,
489
+ task_name=task_name,
490
+ step=step,
491
+ obs_summary=obs_summary,
492
+ last_reward=last_reward,
493
+ history=history,
494
+ )
495
+ if action is not None:
496
+ return action, None
497
+ last_error = error
498
+ print(f"[DEBUG] Model action parse failed on attempt={attempt}: {error}", flush=True)
499
+ return None, last_error
500
+
501
+
502
+ def normalize_base_url(base_url: Optional[str]) -> Optional[str]:
503
+ if not base_url:
504
+ return base_url
505
+
506
+ cleaned = base_url.strip().rstrip("/")
507
+ parsed = urlparse(cleaned)
508
+
509
+ if parsed.netloc.lower() == "huggingface.co":
510
+ parts = [p for p in parsed.path.strip("/").split("/") if p]
511
+ if len(parts) >= 3 and parts[0] == "spaces":
512
+ owner, space = parts[1], parts[2]
513
+ owner = owner.lower().replace("_", "-")
514
+ space = space.lower().replace("_", "-")
515
+ return f"https://{owner}-{space}.hf.space"
516
+
517
+ if cleaned.endswith("/web"):
518
+ cleaned = cleaned[:-4]
519
+ parsed = urlparse(cleaned)
520
+
521
+ host = (parsed.hostname or "").lower()
522
+ if host.endswith(".hf.space"):
523
+ safe_host = host.replace("_", "-")
524
+ if safe_host != host or (parsed.netloc and parsed.netloc != parsed.netloc.lower()):
525
+ port_part = f":{parsed.port}" if parsed.port else ""
526
+ parsed = parsed._replace(netloc=f"{safe_host}{port_part}")
527
+ cleaned = urlunparse(parsed)
528
+
529
+ return cleaned
530
+
531
+
532
+ def _smoke_test_model(client: OpenAI) -> bool:
533
+ print(f"[MODEL_CHECK] Testing model={MODEL_NAME} at {API_BASE_URL} ...", flush=True)
534
+ test_question = (
535
+ "You are a cloud scheduling agent. "
536
+ "A job queue is 80% full and a new urgent job just arrived. "
537
+ "Should you admit the job, reject it, or route it to another queue? "
538
+ "Answer with exactly one JSON object containing action_type and optional fields."
539
+ )
540
+ try:
541
+ resp = client.chat.completions.create(
542
+ model=MODEL_NAME,
543
+ messages=[{"role": "user", "content": test_question}],
544
+ temperature=0.0,
545
+ max_tokens=80,
546
+ )
547
+ reply = (resp.choices[0].message.content or "").strip()
548
+ if not reply:
549
+ print("[MODEL_FAIL] Model returned an empty response.", flush=True)
550
+ return False
551
+ print("[MODEL_OK] model endpoint reachable.", flush=True)
552
+ return True
553
+ except Exception as exc:
554
+ print(f"[MODEL_FAIL] Cannot reach model: {exc}", flush=True)
555
+ return False
556
+
557
+
558
+ async def main() -> None:
559
+ if not API_KEY:
560
+ raise ValueError("API_KEY or HF_TOKEN is required for strict model inference.")
561
+
562
+ client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
563
+ if not _smoke_test_model(client):
564
+ raise RuntimeError("Model smoke test failed. Aborting strict model-only run.")
565
+
566
+ runtime_base_url = normalize_base_url(BASE_URL)
567
+ if runtime_base_url:
568
+ env = CloudQueueEnv(base_url=runtime_base_url)
569
+ else:
570
+ if not IMAGE_NAME:
571
+ raise ValueError("Set BASE_URL for deployed env, or IMAGE_NAME for local docker env.")
572
+ env = await CloudQueueEnv.from_docker_image(IMAGE_NAME)
573
+
574
+ try:
575
+ task_seed_map = parse_task_seed_map()
576
+ replay_map = load_replay_actions()
577
+ task_score_table: dict[str, list[float]] = {}
578
+ seed_rows: list[dict] = []
579
+
580
+ for task_name in TASKS:
581
+ seeds = task_seed_map.get(task_name, [])
582
+ if not seeds:
583
+ continue
584
+
585
+ task_score_table[task_name] = []
586
+
587
+ for seed in seeds:
588
+ history: List[str] = []
589
+ rewards: List[float] = []
590
+ steps_taken = 0
591
+ score = 0.0
592
+ success = False
593
+ failure_reason: Optional[str] = None
594
+
595
+ log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
596
+
597
+ await env.reset()
598
+ await env.step(CloudQueueAction(action_type="configure_task", task_id=task_name, seed=seed))
599
+ result = await env.reset()
600
+
601
+ last_reward = 0.0
602
+ max_steps = max(1, int(result.observation.horizon))
603
+ if MAX_STEPS_OVERRIDE > 0:
604
+ max_steps = min(max_steps, MAX_STEPS_OVERRIDE)
605
+
606
+ replay_key = f"{task_name}:{seed}"
607
+ replay_actions = replay_map.get(replay_key, [])
608
+
609
+ for step in range(1, max_steps + 1):
610
+ if result.done:
611
+ break
612
+
613
+ obs = result.observation
614
+ obs_summary = build_obs_summary(obs, task_name)
615
+
616
+ action: Optional[CloudQueueAction] = None
617
+ model_error: Optional[str] = None
618
+
619
+ if step - 1 < len(replay_actions):
620
+ action = replay_actions[step - 1]
621
+ else:
622
+ action, model_error = get_model_action_with_retry(
623
+ client=client,
624
+ task_name=task_name,
625
+ step=step,
626
+ obs_summary=obs_summary,
627
+ last_reward=last_reward,
628
+ history=history,
629
+ retries=2,
630
+ )
631
+
632
+ if action is None:
633
+ failure_reason = f"model_action_unavailable: {model_error}"
634
+ log_step(
635
+ step=step,
636
+ action="model_action_error",
637
+ reward=0.0,
638
+ done=True,
639
+ error=failure_reason,
640
+ )
641
+ steps_taken = step
642
+ break
643
+
644
+ result = await env.step(action)
645
+ reward = float(result.reward or 0.0)
646
+ done = bool(result.done)
647
+ error = None
648
+ meta = result.observation.metadata or {}
649
+ info = meta.get("info", {}) if isinstance(meta, dict) else {}
650
+ if isinstance(info, dict) and info.get("valid_action") is False:
651
+ error = str(info.get("note", "invalid_action"))
652
+
653
+ rewards.append(reward)
654
+ steps_taken = step
655
+ last_reward = reward
656
+
657
+ action_str = (
658
+ f"{action.action_type}(q={action.target_queue},s={action.target_server},"
659
+ f"d={action.scale_delta},p={action.new_priority})"
660
+ )
661
+ log_step(step=step, action=action_str, reward=reward, done=done, error=error)
662
+ history.append(f"step={step} action={action_str} reward={reward:.2f}")
663
+
664
+ if done:
665
+ break
666
+
667
+ if failure_reason is None and isinstance(result.observation.metadata, dict):
668
+ score = float(result.observation.metadata.get("episode_score", 0.0) or 0.0)
669
+ _m = result.observation.metadata
670
+ print(
671
+ f"[DEBUG_META] task={task_name} seed={seed} "
672
+ f"episode_score={_m.get('episode_score')} "
673
+ f"score_details={_m.get('score_details')} "
674
+ f"metrics_completed={_m.get('metrics', {}).get('completed')} "
675
+ f"metrics_arrivals={_m.get('metrics', {}).get('arrivals')}",
676
+ flush=True,
677
+ )
678
+ elif failure_reason is not None:
679
+ score = 0.0
680
+
681
+ if failure_reason is None and not bool(result.done):
682
+ failure_reason = "episode_not_done_within_max_steps"
683
+ print(
684
+ "[DEBUG] Episode ended early before done=true; "
685
+ "set MAX_STEPS_OVERRIDE=0 or unset it for valid benchmark scores.",
686
+ flush=True,
687
+ )
688
+ score = 0.0
689
+
690
+ score = max(0.0, min(1.0, score))
691
+ task_score_table[task_name].append(score)
692
+ success = failure_reason is None and score >= SUCCESS_SCORE_THRESHOLD
693
+ log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
694
+
695
+ meta = result.observation.metadata or {}
696
+ metrics = meta.get("metrics", {}) if isinstance(meta, dict) else {}
697
+ seed_row = {
698
+ "task": task_name,
699
+ "seed": int(seed),
700
+ "score": round(score, 6),
701
+ "steps": int(steps_taken),
702
+ "success": bool(success),
703
+ "trace_digest": str(meta.get("trace_digest", "")),
704
+ "invalid_actions": float(metrics.get("invalid_actions", 0.0)),
705
+ "harmful_scale_down": float(metrics.get("harmful_scale_down", 0.0)),
706
+ "failure_reason": failure_reason or "",
707
+ }
708
+ seed_rows.append(seed_row)
709
+ print(
710
+ "[REPORT_SEED] "
711
+ f"task={seed_row['task']} seed={seed_row['seed']} score={seed_row['score']:.3f} "
712
+ f"steps={seed_row['steps']} trace={seed_row['trace_digest']}",
713
+ flush=True,
714
+ )
715
+
716
+ task_scores = task_score_table[task_name]
717
+ task_mean = statistics.mean(task_scores) if task_scores else 0.0
718
+ task_std = statistics.pstdev(task_scores) if len(task_scores) > 1 else 0.0
719
+ task_ci = ci95(task_scores)
720
+ print(
721
+ f"[REPORT] task={task_name} seeds={len(task_scores)} mean={task_mean:.3f} std={task_std:.3f} ci95={task_ci:.3f}",
722
+ flush=True,
723
+ )
724
+
725
+ all_task_means = []
726
+ for task_name in TASKS:
727
+ scores = task_score_table.get(task_name, [])
728
+ if scores:
729
+ all_task_means.append(statistics.mean(scores))
730
+
731
+ if all_task_means:
732
+ final_score = sum(all_task_means) / len(all_task_means)
733
+ easy_mean = statistics.mean(task_score_table.get("easy", [0.0]))
734
+ medium_mean = statistics.mean(task_score_table.get("medium", [0.0]))
735
+ hard_mean = statistics.mean(task_score_table.get("hard", [0.0]))
736
+ print(
737
+ f"[SUMMARY] easy={easy_mean:.3f} medium={medium_mean:.3f} hard={hard_mean:.3f} final={final_score:.3f}",
738
+ flush=True,
739
+ )
740
+
741
+ write_reports(seed_rows=seed_rows, task_score_table=task_score_table)
742
+
743
+ finally:
744
+ try:
745
+ await env.close()
746
+ except Exception as exc:
747
+ print(f"[DEBUG] env.close() error (container cleanup): {exc}", flush=True)
748
+
749
+
750
+ if __name__ == "__main__":
751
+ asyncio.run(main())
models.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Data models for the Cloud Queue Env queue operations environment."""
8
+
9
+ from openenv.core.env_server.types import Action, Observation
10
+ from pydantic import Field
11
+
12
+
13
+ class CloudQueueAction(Action):
14
+ """Action model for queue control decisions."""
15
+
16
+ action_type: str = Field(
17
+ default="noop",
18
+ description=(
19
+ "One of: configure_task, admit, reject, route, dispatch, scale, reprioritize, noop"
20
+ ),
21
+ )
22
+ target_queue: int | None = Field(default=None, description="Queue index for route/dispatch")
23
+ target_server: int | None = Field(default=None, description="Server index for dispatch")
24
+ scale_delta: int | None = Field(default=None, description="Server pool scale delta for scale action")
25
+ new_priority: int | None = Field(default=None, description="Updated priority for reprioritize action")
26
+ task_id: str | None = Field(default=None, description="Task selector: easy, medium, or hard")
27
+ seed: int | None = Field(default=None, description="Deterministic seed for upcoming reset")
28
+
29
+
30
+ class CloudQueueObservation(Observation):
31
+ """Observation model exposing queue system state to the agent."""
32
+
33
+ task_id: str = Field(default="easy", description="Active benchmark task")
34
+ sim_time: int = Field(default=0, description="Discrete simulation time step")
35
+ horizon: int = Field(default=0, description="Episode horizon")
36
+ queue_lengths: list[int] = Field(default_factory=list, description="Length per queue")
37
+ queue_wait_ema: list[float] = Field(default_factory=list, description="EMA wait time per queue")
38
+ server_busy: list[int] = Field(default_factory=list, description="1 if server is busy, else 0")
39
+ server_remaining_service: list[float] = Field(
40
+ default_factory=list,
41
+ description="Remaining service time per server",
42
+ )
43
+ utilization: list[float] = Field(default_factory=list, description="Rolling utilization by server")
44
+ incoming_job_present: bool = Field(default=False, description="Whether a new job is waiting for admission")
45
+ incoming_job_size: float = Field(default=0.0, description="Incoming job estimated size")
46
+ incoming_job_priority: int = Field(default=0, description="Incoming job priority")
47
+ incoming_job_deadline: float = Field(default=0.0, description="Incoming job deadline")
48
+ incoming_job_type: int = Field(default=0, description="Incoming job class/type id")
49
+ sla_violation_rate: float = Field(default=0.0, description="Running SLA violation rate")
50
+ abandonment_rate: float = Field(default=0.0, description="Running abandonment rate")
51
+ throughput_recent: float = Field(default=0.0, description="Completed jobs in current step")
52
+ energy_cost_rate: float = Field(default=0.0, description="Current infrastructure cost rate")
53
+ level: float = Field(default=1.0, description="Difficulty level scalar")
54
+ optional_history: list[float] = Field(default_factory=list, description="Compact recent context")
55
+ action_mask: list[int] = Field(default_factory=list, description="Optional valid action hints")
openenv.yaml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ spec_version: 1
2
+ name: cloud_queue_env
3
+ type: space
4
+ runtime: fastapi
5
+ app: server.app:app
6
+ port: 8000
7
+
8
+ metadata:
9
+ description: >
10
+ A real-world queueing control environment where an agent manages
11
+ cloud request scheduling decisions — admission control, routing,
12
+ dispatching, and dynamic server scaling — under stochastic arrivals
13
+ and service times. Optimizes latency, throughput, SLA compliance,
14
+ fairness, and infrastructure cost across three benchmark tasks
15
+ (Easy / Medium / Hard) with deterministic graders scored in (0, 1).
16
+ tags:
17
+ - openenv
18
+ - reinforcement-learning
19
+ - queueing
20
+ - scheduling
21
+ - cloud-operations
22
+ - multi-objective
23
+ - llm-agent
24
+ difficulty: easy-to-hard
25
+ tasks:
26
+ - easy
27
+ - medium
28
+ - hard
29
+ author: Mrkumar007
30
+
pyproject.toml ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ [build-system]
8
+ requires = ["setuptools>=45", "wheel"]
9
+ build-backend = "setuptools.build_meta"
10
+
11
+ [project]
12
+ name = "openenv-cloud_queue_env"
13
+ version = "0.1.0"
14
+ description = "Cloud Queue Env environment for OpenEnv"
15
+ requires-python = ">=3.10"
16
+ dependencies = [
17
+ # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
18
+ # install from github
19
+ # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
20
+ "openenv-core[core]>=0.2.2",
21
+ # Environment-specific dependencies
22
+ # Add all dependencies needed for your environment here
23
+ # Examples:
24
+ # "numpy>=1.19.0",
25
+ # "torch>=2.0.0",
26
+ # "gymnasium>=0.29.0",
27
+ # "openspiel>=1.0.0",
28
+ # "smolagents>=1.22.0,<2",
29
+ ]
30
+
31
+ [project.optional-dependencies]
32
+ dev = [
33
+ "pytest>=8.0.0",
34
+ "pytest-cov>=4.0.0",
35
+ ]
36
+
37
+ [project.scripts]
38
+ # Server entry point - enables running via: uv run --project . server
39
+ # or: python -m cloud_queue_env.server.app
40
+ server = "cloud_queue_env.server.app:main"
41
+
42
+ [tool.setuptools]
43
+ include-package-data = true
44
+ packages = ["cloud_queue_env", "cloud_queue_env.server"]
45
+ package-dir = { "cloud_queue_env" = ".", "cloud_queue_env.server" = "server" }
ref_inference.py ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Inference Script Example
3
+ ===================================
4
+ MANDATORY
5
+ - Before submitting, ensure the following variables are defined in your environment configuration:
6
+ API_BASE_URL The API endpoint for the LLM.
7
+ MODEL_NAME The model identifier to use for inference.
8
+ HF_TOKEN Your Hugging Face / API key.
9
+ LOCAL_IMAGE_NAME The name of the local image to use for the environment if you are using from_docker_image()
10
+ method
11
+
12
+ - Defaults are set only for API_BASE_URL and MODEL_NAME
13
+ (and should reflect your active inference setup):
14
+ API_BASE_URL = os.getenv("API_BASE_URL", "<your-active-endpoint>")
15
+ MODEL_NAME = os.getenv("MODEL_NAME", "<your-active-model>")
16
+
17
+ - The inference script must be named `inference.py` and placed in the root directory of the project
18
+ - Participants must use OpenAI Client for all LLM calls using above variables
19
+
20
+ STDOUT FORMAT
21
+ - The script must emit exactly three line types to stdout, in this order:
22
+
23
+ [START] task=<task_name> env=<benchmark> model=<model_name>
24
+ [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
25
+ [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
26
+
27
+ Rules:
28
+ - One [START] line at episode begin.
29
+ - One [STEP] line per step, immediately after env.step() returns.
30
+ - One [END] line after env.close(), always emitted (even on exception).
31
+ - reward and rewards are formatted to 2 decimal places.
32
+ - done and success are lowercase booleans: true or false.
33
+ - error is the raw last_action_error string, or null if none.
34
+ - All fields on a single line with no newlines within a line.
35
+ - Each tasks should return score in [0, 1]
36
+
37
+ Example:
38
+ [START] task=click-test env=miniwob model=Qwen3-VL-30B
39
+ [STEP] step=1 action=click('123') reward=0.00 done=false error=null
40
+ [STEP] step=2 action=fill('456','text') reward=0.00 done=false error=null
41
+ [STEP] step=3 action=click('789') reward=1.00 done=true error=null
42
+ [END] success=true steps=3 score=1.00 rewards=0.00,0.00,1.00
43
+ """
44
+
45
+ import asyncio
46
+ import os
47
+ import textwrap
48
+ from typing import List, Optional
49
+
50
+ from openai import OpenAI
51
+
52
+ from my_env_v4 import MyEnvV4Action, MyEnvV4Env
53
+ IMAGE_NAME = os.getenv("IMAGE_NAME") # If you are using docker image
54
+ API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
55
+
56
+ API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
57
+ MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
58
+ TASK_NAME = os.getenv("MY_ENV_V4_TASK", "echo")
59
+ BENCHMARK = os.getenv("MY_ENV_V4_BENCHMARK", "my_env_v4")
60
+ MAX_STEPS = 8
61
+ TEMPERATURE = 0.7
62
+ MAX_TOKENS = 150
63
+ SUCCESS_SCORE_THRESHOLD = 0.1 # normalized score in [0, 1]
64
+
65
+ # Max possible reward: each token contributes 0.1, across all steps
66
+ _MAX_REWARD_PER_STEP = MAX_TOKENS * 0.1
67
+ MAX_TOTAL_REWARD = MAX_STEPS * _MAX_REWARD_PER_STEP
68
+
69
+ SYSTEM_PROMPT = textwrap.dedent(
70
+ """
71
+ You are interacting with a simple echo environment.
72
+ Each turn you must send a message. The environment will echo it back.
73
+ Reward is proportional to message length: reward = len(message) * 0.1
74
+ Your goal is to maximize total reward by sending meaningful, substantive messages.
75
+ Reply with exactly one message string — no quotes, no prefixes, just the message text.
76
+ """
77
+ ).strip()
78
+
79
+
80
+ def log_start(task: str, env: str, model: str) -> None:
81
+ print(f"[START] task={task} env={env} model={model}", flush=True)
82
+
83
+
84
+ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
85
+ error_val = error if error else "null"
86
+ done_val = str(done).lower()
87
+ print(
88
+ f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
89
+ flush=True,
90
+ )
91
+
92
+
93
+ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
94
+ rewards_str = ",".join(f"{r:.2f}" for r in rewards)
95
+ print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
96
+
97
+
98
+ def build_user_prompt(step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
99
+ history_block = "\n".join(history[-4:]) if history else "None"
100
+ return textwrap.dedent(
101
+ f"""
102
+ Step: {step}
103
+ Last echoed message: {last_echoed!r}
104
+ Last reward: {last_reward:.2f}
105
+ Previous steps:
106
+ {history_block}
107
+ Send your next message.
108
+ """
109
+ ).strip()
110
+
111
+
112
+ def get_model_message(client: OpenAI, step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
113
+ user_prompt = build_user_prompt(step, last_echoed, last_reward, history)
114
+ try:
115
+ completion = client.chat.completions.create(
116
+ model=MODEL_NAME,
117
+ messages=[
118
+ {"role": "system", "content": SYSTEM_PROMPT},
119
+ {"role": "user", "content": user_prompt},
120
+ ],
121
+ temperature=TEMPERATURE,
122
+ max_tokens=MAX_TOKENS,
123
+ stream=False,
124
+ )
125
+ text = (completion.choices[0].message.content or "").strip()
126
+ return text if text else "hello"
127
+ except Exception as exc:
128
+ print(f"[DEBUG] Model request failed: {exc}", flush=True)
129
+ return "hello"
130
+
131
+
132
+ async def main() -> None:
133
+ client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
134
+
135
+ env = await MyEnvV4Env.from_docker_image(IMAGE_NAME)
136
+
137
+ history: List[str] = []
138
+ rewards: List[float] = []
139
+ steps_taken = 0
140
+ score = 0.0
141
+ success = False
142
+
143
+ log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
144
+
145
+ try:
146
+ result = await env.reset() # OpenENV.reset()
147
+ last_echoed = result.observation.echoed_message
148
+ last_reward = 0.0
149
+
150
+ for step in range(1, MAX_STEPS + 1):
151
+ if result.done:
152
+ break
153
+
154
+ message = get_model_message(client, step, last_echoed, last_reward, history)
155
+
156
+ result = await env.step(MyEnvV4Action(message=message))
157
+ obs = result.observation
158
+
159
+ reward = result.reward or 0.0
160
+ done = result.done
161
+ error = None
162
+
163
+ rewards.append(reward)
164
+ steps_taken = step
165
+ last_echoed = obs.echoed_message
166
+ last_reward = reward
167
+
168
+ log_step(step=step, action=message, reward=reward, done=done, error=error)
169
+
170
+ history.append(f"Step {step}: {message!r} -> reward {reward:+.2f}")
171
+
172
+ if done:
173
+ break
174
+
175
+ score = sum(rewards) / MAX_TOTAL_REWARD if MAX_TOTAL_REWARD > 0 else 0.0
176
+ score = min(max(score, 0.0), 1.0) # clamp to [0, 1]
177
+ success = score >= SUCCESS_SCORE_THRESHOLD
178
+
179
+ finally:
180
+ try:
181
+ await env.close()
182
+ except Exception as e:
183
+ print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
184
+ log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
185
+
186
+
187
+ if __name__ == "__main__":
188
+ asyncio.run(main())
server/__init__.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Cloud Queue Env environment server components."""
8
+
9
+ from .cloud_queue_env_environment import CloudQueueEnvironment
10
+
11
+ __all__ = ["CloudQueueEnvironment"]
server/app.py ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ FastAPI application for the Cloud Queue Env Environment.
9
+
10
+ This module creates an HTTP server that exposes the CloudQueueEnvironment
11
+ over HTTP and WebSocket endpoints, compatible with EnvClient.
12
+
13
+ Endpoints:
14
+ - POST /reset: Reset the environment
15
+ - POST /step: Execute an action
16
+ - GET /state: Get current environment state
17
+ - GET /schema: Get action/observation schemas
18
+ - WS /ws: WebSocket endpoint for persistent sessions
19
+
20
+ Usage:
21
+ # Development (with auto-reload):
22
+ uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
23
+
24
+ # Production:
25
+ uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
26
+
27
+ # Or run directly:
28
+ python -m server.app
29
+ """
30
+
31
+ try:
32
+ from openenv.core.env_server.http_server import create_app
33
+ except Exception as e: # pragma: no cover
34
+ raise ImportError(
35
+ "openenv is required for the web interface. Install dependencies with '\n uv sync\n'"
36
+ ) from e
37
+
38
+ try:
39
+ from ..models import CloudQueueAction, CloudQueueObservation
40
+ from .cloud_queue_env_environment import CloudQueueEnvironment
41
+ except ImportError:
42
+ from models import CloudQueueAction, CloudQueueObservation
43
+ from server.cloud_queue_env_environment import CloudQueueEnvironment
44
+
45
+
46
+ # Create the app with web interface and README integration
47
+ app = create_app(
48
+ CloudQueueEnvironment,
49
+ CloudQueueAction,
50
+ CloudQueueObservation,
51
+ env_name="cloud_queue_env",
52
+ max_concurrent_envs=1, # increase this number to allow more concurrent WebSocket sessions
53
+ )
54
+
55
+
56
+ def main(host: str = "0.0.0.0", port: int = 8000) -> None:
57
+ """
58
+ Entry point for direct execution via uv run or python -m.
59
+
60
+ This function enables running the server without Docker:
61
+ uv run --project . server
62
+ uv run --project . server --port 8001
63
+ python -m cloud_queue_env.server.app
64
+
65
+ Args:
66
+ host: Host address to bind to (default: "0.0.0.0")
67
+ port: Port number to listen on (default: 8000)
68
+
69
+ For production deployments, consider using uvicorn directly with
70
+ multiple workers:
71
+ uvicorn cloud_queue_env.server.app:app --workers 4
72
+ """
73
+ import uvicorn
74
+
75
+ uvicorn.run(app, host=host, port=port)
76
+
77
+
78
+ def _cli_main() -> None:
79
+ import argparse
80
+
81
+ parser = argparse.ArgumentParser()
82
+ parser.add_argument("--port", type=int, default=8000)
83
+ parser.add_argument("--host", type=str, default="0.0.0.0")
84
+ args = parser.parse_args()
85
+ main(host=args.host, port=args.port)
86
+
87
+
88
+ if __name__ == '__main__':
89
+ main()
server/cloud_queue_env_environment.py ADDED
@@ -0,0 +1,762 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Queue operations environment with deterministic task grading."""
8
+
9
+ import math
10
+ import random
11
+ import hashlib
12
+ from collections import deque
13
+ from dataclasses import dataclass
14
+ from uuid import uuid4
15
+
16
+ from openenv.core.env_server.interfaces import Environment
17
+ from openenv.core.env_server.types import State
18
+
19
+ try:
20
+ from ..models import CloudQueueAction, CloudQueueObservation
21
+ except ImportError:
22
+ from models import CloudQueueAction, CloudQueueObservation
23
+
24
+
25
+ @dataclass
26
+ class TaskConfig:
27
+ task_id: str
28
+ horizon: int
29
+ level: float
30
+ queue_count: int
31
+ initial_servers: int
32
+ min_servers: int
33
+ max_servers: int
34
+ arrival_rate: float
35
+ urgent_ratio: float
36
+ service_mean: float
37
+ deadline_base: int
38
+ allow_scaling: bool
39
+ allow_priority: bool
40
+ two_stage: bool
41
+ server_cost: float
42
+ max_queue_size: int
43
+ score_refs: dict[str, float]
44
+
45
+
46
+ class CloudQueueEnvironment(Environment):
47
+ """Deterministic queueing environment with easy/medium/hard benchmark tasks."""
48
+
49
+ SUPPORTS_CONCURRENT_SESSIONS: bool = True
50
+
51
+ def __init__(self):
52
+ self._task_configs = self._build_task_configs()
53
+ self._active_task_id = "easy"
54
+ self._pending_task_id = "easy"
55
+ self._pending_seed = 7
56
+ self._rng_streams: dict[str, random.Random] = {}
57
+ self._rng_stream_seeds: dict[str, int] = {}
58
+ self._state = State(episode_id=str(uuid4()), step_count=0)
59
+ self._sim_time = 0
60
+ self._queues: list[deque[dict]] = []
61
+ self._servers: list[dict] = []
62
+ self._incoming_job: dict | None = None
63
+ self._done = False
64
+ self._wait_ema: list[float] = []
65
+ self._utilization_ema: list[float] = []
66
+ self._metrics: dict[str, float] = {}
67
+ self._recent_rewards: deque[float] = deque(maxlen=8)
68
+ self._action_trace: list[str] = []
69
+ self._reset_runtime_state()
70
+
71
+ def _build_task_configs(self) -> dict[str, TaskConfig]:
72
+ return {
73
+ "easy": TaskConfig(
74
+ task_id="easy",
75
+ horizon=150,
76
+ level=1.0,
77
+ queue_count=1,
78
+ initial_servers=1,
79
+ min_servers=1,
80
+ max_servers=1,
81
+ arrival_rate=0.78,
82
+ urgent_ratio=0.0,
83
+ service_mean=1.6,
84
+ deadline_base=10,
85
+ allow_scaling=False,
86
+ allow_priority=False,
87
+ two_stage=False,
88
+ server_cost=0.04,
89
+ max_queue_size=28,
90
+ score_refs={"wait": 6.0, "thr": 70.0, "rej": 0.3, "sla": 0.3},
91
+ ),
92
+ "medium": TaskConfig(
93
+ task_id="medium",
94
+ horizon=200,
95
+ level=2.3,
96
+ queue_count=2,
97
+ initial_servers=3,
98
+ min_servers=3, # scaling disabled on medium — lock to initial_servers
99
+ max_servers=3, # scaling disabled on medium — lock to initial_servers
100
+ arrival_rate=1.15,
101
+ urgent_ratio=0.28,
102
+ service_mean=1.8,
103
+ deadline_base=8,
104
+ allow_scaling=False,
105
+ allow_priority=True,
106
+ two_stage=False,
107
+ server_cost=0.06,
108
+ max_queue_size=42,
109
+ score_refs={"uw": 7.0, "nw": 10.0, "usla": 0.25, "thr": 125.0, "cost": 14.0},
110
+ ),
111
+ "hard": TaskConfig(
112
+ task_id="hard",
113
+ horizon=250,
114
+ level=4.0,
115
+ queue_count=2,
116
+ initial_servers=3,
117
+ min_servers=1,
118
+ max_servers=6,
119
+ arrival_rate=1.45,
120
+ urgent_ratio=0.35,
121
+ service_mean=2.2,
122
+ deadline_base=7,
123
+ allow_scaling=True,
124
+ allow_priority=True,
125
+ two_stage=True,
126
+ server_cost=0.1,
127
+ max_queue_size=64,
128
+ score_refs={
129
+ "e2e": 14.0,
130
+ "abd": 0.25,
131
+ "sla": 0.3,
132
+ "thr": 145.0,
133
+ "cost": 28.0,
134
+ "fair": 0.35,
135
+ },
136
+ ),
137
+ }
138
+
139
+ def _reset_runtime_state(self) -> None:
140
+ cfg = self._task_configs[self._active_task_id]
141
+ self._sim_time = 0
142
+ self._done = False
143
+ self._incoming_job = None
144
+ self._action_trace = []
145
+ self._queues = [deque() for _ in range(cfg.queue_count)]
146
+ self._servers = [
147
+ {"remaining": 0.0, "job": None, "active": True}
148
+ for _ in range(cfg.initial_servers)
149
+ ]
150
+ self._wait_ema = [0.0 for _ in range(cfg.queue_count)]
151
+ self._utilization_ema = [0.0 for _ in range(cfg.max_servers)]
152
+ self._recent_rewards.clear()
153
+ self._metrics = {
154
+ "arrivals": 0.0,
155
+ "accepted": 0.0,
156
+ "rejected": 0.0,
157
+ "completed": 0.0,
158
+ "completed_urgent": 0.0,
159
+ "abandoned": 0.0,
160
+ "wait_sum": 0.0,
161
+ "wait_count": 0.0,
162
+ "wait_sum_urgent": 0.0,
163
+ "wait_count_urgent": 0.0,
164
+ "wait_sum_normal": 0.0,
165
+ "wait_count_normal": 0.0,
166
+ "sla_breaches": 0.0,
167
+ "sla_breaches_urgent": 0.0,
168
+ "invalid_actions": 0.0,
169
+ "noop_under_load": 0.0,
170
+ "harmful_scale_down": 0.0,
171
+ "action_cost": 0.0,
172
+ "infra_cost": 0.0,
173
+ "fairness_gap_sum": 0.0,
174
+ "fairness_gap_count": 0.0,
175
+ }
176
+ self._wait_samples_all: list[float] = []
177
+ self._wait_samples_urgent: list[float] = []
178
+ self._wait_samples_normal: list[float] = []
179
+ self._e2e_wait_samples: list[float] = []
180
+
181
+ def _init_rng_streams(self, base_seed: int) -> None:
182
+ self._rng_stream_seeds = {
183
+ "arrivals": int(base_seed) + 101,
184
+ "service": int(base_seed) + 211,
185
+ "abandonment": int(base_seed) + 307,
186
+ "exogenous": int(base_seed) + 401,
187
+ }
188
+ self._rng_streams = {
189
+ key: random.Random(seed) for key, seed in self._rng_stream_seeds.items()
190
+ }
191
+
192
+ def _rng(self, stream: str) -> random.Random:
193
+ return self._rng_streams[stream]
194
+
195
+ def _sample_poisson(self, lam: float, rng: random.Random) -> int:
196
+ lam = max(0.0, lam)
197
+ if lam == 0.0:
198
+ return 0
199
+ # Knuth algorithm is sufficient for this environment's lambda scale.
200
+ l_term = math.exp(-lam)
201
+ k = 0
202
+ p = 1.0
203
+ while p > l_term:
204
+ k += 1
205
+ p *= rng.random()
206
+ return max(0, k - 1)
207
+
208
+ def _trace_digest(self) -> str:
209
+ raw = f"task={self._active_task_id}|seed={self._pending_seed}|" + "|".join(self._action_trace)
210
+ return hashlib.sha256(raw.encode("utf-8")).hexdigest()[:16]
211
+
212
+ def reset(self) -> CloudQueueObservation:
213
+ self._active_task_id = self._pending_task_id if self._pending_task_id in self._task_configs else "easy"
214
+ self._init_rng_streams(self._pending_seed)
215
+ self._state = State(episode_id=str(uuid4()), step_count=0)
216
+ self._reset_runtime_state()
217
+ return self._build_observation(reward=0.0, done=False, info={"event": "reset"})
218
+
219
+ def _clamp(self, value: float, lo: float, hi: float) -> float:
220
+ return max(lo, min(hi, value))
221
+
222
+ def _sample_service_time(self, cfg: TaskConfig) -> float:
223
+ service_rng = self._rng("service")
224
+ if cfg.task_id == "hard":
225
+ heavy = service_rng.random() < 0.22
226
+ if heavy:
227
+ return self._clamp(service_rng.lognormvariate(1.2, 0.7), 1.0, 12.0)
228
+ return self._clamp(service_rng.expovariate(1.0 / cfg.service_mean), 0.5, 10.0)
229
+
230
+ def _sample_arrivals(self, cfg: TaskConfig) -> int:
231
+ arrival_rng = self._rng("arrivals")
232
+ exogenous_rng = self._rng("exogenous")
233
+ rate = cfg.arrival_rate
234
+ if cfg.task_id == "hard":
235
+ wave = 0.35 * math.sin((self._sim_time + 1) / 13.0)
236
+ jitter = exogenous_rng.uniform(-0.05, 0.05)
237
+ rate += wave + jitter
238
+ return self._sample_poisson(rate, arrival_rng)
239
+
240
+ def _spawn_incoming_job(self, cfg: TaskConfig) -> None:
241
+ arrivals = self._sample_arrivals(cfg)
242
+ if arrivals <= 0:
243
+ self._incoming_job = None
244
+ return
245
+ arrival_rng = self._rng("arrivals")
246
+ priority = 2 if arrival_rng.random() < cfg.urgent_ratio else 1
247
+ size = self._sample_service_time(cfg)
248
+ self._incoming_job = {
249
+ "priority": priority,
250
+ "queue": 0,
251
+ "created_step": self._state.step_count,
252
+ "wait": 0.0,
253
+ "size": size,
254
+ "remaining": size,
255
+ "deadline": self._state.step_count + cfg.deadline_base - (1 if priority == 2 else 0),
256
+ "type": 1 if priority == 2 else 0,
257
+ "stage": 0,
258
+ }
259
+ self._metrics["arrivals"] += 1.0
260
+
261
+ def _update_wait_and_abandonment(self, cfg: TaskConfig) -> float:
262
+ abandonment_rng = self._rng("abandonment")
263
+ abandoned_this_step = 0.0
264
+ for qi, q in enumerate(self._queues):
265
+ kept: deque[dict] = deque()
266
+ while q:
267
+ job = q.popleft()
268
+ job["wait"] += 1.0
269
+ patience = cfg.deadline_base + (2 if job["priority"] == 2 else 4)
270
+ if cfg.task_id == "hard" and job["wait"] > patience and abandonment_rng.random() < 0.35:
271
+ abandoned_this_step += 1.0
272
+ continue
273
+ kept.append(job)
274
+ self._queues[qi] = kept
275
+ if abandoned_this_step:
276
+ self._metrics["abandoned"] += abandoned_this_step
277
+ return abandoned_this_step
278
+
279
+ def _complete_job(self, cfg: TaskConfig, job: dict) -> None:
280
+ if cfg.two_stage and job["stage"] == 0:
281
+ forwarded = dict(job)
282
+ forwarded["stage"] = 1
283
+ forwarded["queue"] = min(1, len(self._queues) - 1)
284
+ forwarded["remaining"] = self._sample_service_time(cfg)
285
+ self._queues[forwarded["queue"]].append(forwarded)
286
+ return
287
+
288
+ self._metrics["completed"] += 1.0
289
+ wait = float(self._state.step_count - job["created_step"])
290
+ self._metrics["wait_sum"] += wait
291
+ self._metrics["wait_count"] += 1.0
292
+ self._wait_samples_all.append(wait)
293
+ self._e2e_wait_samples.append(wait)
294
+ if job["priority"] == 2:
295
+ self._metrics["completed_urgent"] += 1.0
296
+ self._metrics["wait_sum_urgent"] += wait
297
+ self._metrics["wait_count_urgent"] += 1.0
298
+ self._wait_samples_urgent.append(wait)
299
+ else:
300
+ self._metrics["wait_sum_normal"] += wait
301
+ self._metrics["wait_count_normal"] += 1.0
302
+ self._wait_samples_normal.append(wait)
303
+ if self._state.step_count > job["deadline"]:
304
+ self._metrics["sla_breaches"] += 1.0
305
+ if job["priority"] == 2:
306
+ self._metrics["sla_breaches_urgent"] += 1.0
307
+
308
+ def _process_servers(self, cfg: TaskConfig) -> float:
309
+ completed_this_step = 0.0
310
+ for si, server in enumerate(self._servers):
311
+ if not server["active"]:
312
+ continue
313
+ if server["remaining"] > 0:
314
+ server["remaining"] = max(0.0, server["remaining"] - 1.0)
315
+ if server["remaining"] <= 0 and server["job"] is not None:
316
+ self._complete_job(cfg, server["job"])
317
+ completed_this_step += 1.0
318
+ server["job"] = None
319
+ busy_flag = 1.0 if server["job"] is not None else 0.0
320
+ if si < len(self._utilization_ema):
321
+ self._utilization_ema[si] = 0.9 * self._utilization_ema[si] + 0.1 * busy_flag
322
+ return completed_this_step
323
+
324
+ def _admit_job(self, cfg: TaskConfig, queue_idx: int) -> tuple[bool, str]:
325
+ if self._incoming_job is None:
326
+ return False, "no_incoming_job"
327
+ if queue_idx < 0 or queue_idx >= len(self._queues):
328
+ return False, "invalid_queue"
329
+ if len(self._queues[queue_idx]) >= cfg.max_queue_size:
330
+ self._metrics["rejected"] += 1.0
331
+ self._incoming_job = None
332
+ return True, "queue_full_rejected"
333
+ job = dict(self._incoming_job)
334
+ job["queue"] = queue_idx
335
+ self._queues[queue_idx].append(job)
336
+ self._incoming_job = None
337
+ self._metrics["accepted"] += 1.0
338
+ return True, "admitted"
339
+
340
+ def _dispatch(self, queue_idx: int | None) -> tuple[bool, str]:
341
+ target = 0 if queue_idx is None else queue_idx
342
+ if target < 0 or target >= len(self._queues):
343
+ return False, "invalid_dispatch_queue"
344
+ for server in self._servers:
345
+ if not server["active"]:
346
+ continue
347
+ if server["job"] is None and self._queues[target]:
348
+ server["job"] = self._queues[target].popleft()
349
+ server["remaining"] = server["job"]["remaining"]
350
+ return True, "dispatched"
351
+ return False, "no_idle_server_or_empty_queue"
352
+
353
+ def _autodispatch(self) -> None:
354
+ for server in self._servers:
355
+ if not server["active"] or server["job"] is not None:
356
+ continue
357
+ for q in self._queues:
358
+ if q:
359
+ server["job"] = q.popleft()
360
+ server["remaining"] = server["job"]["remaining"]
361
+ break
362
+
363
+ def _apply_action(self, action: CloudQueueAction, cfg: TaskConfig) -> tuple[bool, str]:
364
+ action_type = (action.action_type or "noop").lower()
365
+
366
+ if action_type == "configure_task":
367
+ if action.task_id and action.task_id in self._task_configs:
368
+ self._pending_task_id = action.task_id
369
+ if action.seed is not None:
370
+ self._pending_seed = int(action.seed)
371
+ return True, "configuration_updated_for_next_reset"
372
+
373
+ if self._done:
374
+ return False, "episode_already_done"
375
+
376
+ if action_type == "admit":
377
+ queue_idx = action.target_queue if action.target_queue is not None else 0
378
+ return self._admit_job(cfg, queue_idx)
379
+
380
+ if action_type == "reject":
381
+ if self._incoming_job is None:
382
+ return False, "no_incoming_job"
383
+ self._incoming_job = None
384
+ self._metrics["rejected"] += 1.0
385
+ return True, "rejected"
386
+
387
+ if action_type == "route":
388
+ queue_idx = action.target_queue if action.target_queue is not None else 0
389
+ return self._admit_job(cfg, queue_idx)
390
+
391
+ if action_type == "dispatch":
392
+ return self._dispatch(action.target_queue)
393
+
394
+ if action_type == "scale":
395
+ if not cfg.allow_scaling:
396
+ return False, "scaling_not_supported_for_task"
397
+ delta = action.scale_delta if action.scale_delta is not None else 0
398
+ if delta == 0:
399
+ return True, "no_scale_change"
400
+ active_count = sum(1 for s in self._servers if s["active"])
401
+ requested = int(self._clamp(active_count + delta, cfg.min_servers, cfg.max_servers))
402
+ if requested == active_count:
403
+ return True, "scale_clamped_no_change"
404
+ if requested > active_count:
405
+ for _ in range(requested - active_count):
406
+ self._servers.append({"remaining": 0.0, "job": None, "active": True})
407
+ self._utilization_ema.append(0.0)
408
+ else:
409
+ to_disable = active_count - requested
410
+ for server in reversed(self._servers):
411
+ if to_disable == 0:
412
+ break
413
+ if server["active"] and server["job"] is None:
414
+ server["active"] = False
415
+ to_disable -= 1
416
+ self._metrics["action_cost"] += abs(delta) * 0.35
417
+ return True, "scaled"
418
+
419
+ if action_type == "reprioritize":
420
+ if not cfg.allow_priority:
421
+ return False, "reprioritize_not_supported_for_task"
422
+ new_priority = 2 if (action.new_priority or 1) >= 2 else 1
423
+ for q in self._queues:
424
+ for job in q:
425
+ if job["priority"] == 1:
426
+ job["priority"] = new_priority
427
+ return True, "reprioritized"
428
+ return False, "no_eligible_job"
429
+
430
+ if action_type == "noop":
431
+ return True, "noop"
432
+
433
+ return False, "unknown_action_type"
434
+
435
+ def _percentile(self, values: list[float], p: float) -> float:
436
+ if not values:
437
+ return 0.0
438
+ ordered = sorted(values)
439
+ idx = int(self._clamp(round((len(ordered) - 1) * p), 0, len(ordered) - 1))
440
+ return float(ordered[idx])
441
+
442
+ def _safe_div(self, numerator: float, denominator: float) -> float:
443
+ if denominator <= 0:
444
+ return 0.0
445
+ return numerator / denominator
446
+
447
+ def _current_fairness_gap(self) -> float:
448
+ urgent_avg = self._safe_div(self._metrics["wait_sum_urgent"], self._metrics["wait_count_urgent"])
449
+ normal_avg = self._safe_div(self._metrics["wait_sum_normal"], self._metrics["wait_count_normal"])
450
+ scale = max(1.0, urgent_avg + normal_avg)
451
+ return abs(urgent_avg - normal_avg) / scale
452
+
453
+ def _compute_reward(
454
+ self,
455
+ cfg: TaskConfig,
456
+ action_ok: bool,
457
+ action_type: str,
458
+ action_scale_delta: int,
459
+ completed_step: float,
460
+ ) -> tuple[float, dict[str, float]]:
461
+ avg_wait = self._safe_div(self._metrics["wait_sum"], self._metrics["wait_count"])
462
+ queue_pressure = sum(len(q) for q in self._queues) / max(1.0, float(cfg.max_queue_size))
463
+ r_wait = -self._clamp(avg_wait / max(cfg.deadline_base, 1), 0.0, 1.5) - 0.15 * self._clamp(queue_pressure, 0.0, 1.5)
464
+ r_throughput = self._clamp(completed_step / max(1.0, float(cfg.initial_servers)), 0.0, 1.0)
465
+ total_decisions = max(1.0, self._metrics["completed"] + self._metrics["abandoned"])
466
+ r_sla = -self._clamp(self._metrics["sla_breaches"] / total_decisions, 0.0, 1.0)
467
+ active_servers = sum(1 for s in self._servers if s["active"])
468
+ r_cost = -self._clamp(active_servers / max(1.0, float(cfg.max_servers)), 0.0, 1.0)
469
+ fairness_gap = self._current_fairness_gap()
470
+ r_fair = -self._clamp(fairness_gap / 0.5, 0.0, 1.0)
471
+ r_safe = 0.0 if action_ok else -1.0
472
+ if not action_ok:
473
+ self._metrics["invalid_actions"] += 1.0
474
+ if action_type == "noop" and self._incoming_job is not None and sum(len(q) for q in self._queues) > 0:
475
+ r_safe -= 0.05
476
+ self._metrics["noop_under_load"] += 1.0
477
+
478
+ arrivals = max(1.0, self._metrics["arrivals"])
479
+ rejection_rate = self._safe_div(self._metrics["rejected"], arrivals)
480
+ if arrivals > 10 and rejection_rate > 0.4:
481
+ r_safe -= self._clamp((rejection_rate - 0.4) * 0.4, 0.0, 0.2)
482
+
483
+ if action_type == "scale" and action_scale_delta < 0 and queue_pressure > 0.45:
484
+ overload_penalty = self._clamp((queue_pressure - 0.45) * 0.5, 0.0, 0.25)
485
+ r_safe -= overload_penalty
486
+ self._metrics["harmful_scale_down"] += 1.0
487
+
488
+ reward = 0.35 * r_wait + 0.20 * r_throughput + 0.20 * r_sla + 0.15 * r_cost + 0.05 * r_fair + 0.05 * r_safe
489
+ reward = self._clamp(reward, -1.0, 1.0)
490
+ self._recent_rewards.append(reward)
491
+
492
+ self._metrics["infra_cost"] += active_servers * cfg.server_cost
493
+ self._metrics["fairness_gap_sum"] += fairness_gap
494
+ self._metrics["fairness_gap_count"] += 1.0
495
+
496
+ components = {
497
+ "wait": round(r_wait, 4),
498
+ "throughput": round(r_throughput, 4),
499
+ "sla": round(r_sla, 4),
500
+ "cost": round(r_cost, 4),
501
+ "fairness": round(r_fair, 4),
502
+ "safety": round(r_safe, 4),
503
+ }
504
+ return reward, components
505
+
506
+ def _score_task(self, cfg: TaskConfig) -> tuple[float, dict[str, float]]:
507
+ # c01: clamp individual sub-score components to [0, 1] inclusive.
508
+ def c01(value: float) -> float:
509
+ if not math.isfinite(value):
510
+ return 0.0
511
+ return self._clamp(value, 0.0, 1.0)
512
+
513
+ # _strict01: final clamp applied only to the episode score.
514
+ # Validator requires score strictly in (0, 1) — never 0.0 or 1.0.
515
+ _SCORE_MIN = 0.001
516
+ _SCORE_MAX = 0.999
517
+
518
+ def strict01(value: float) -> float:
519
+ if not math.isfinite(value):
520
+ return _SCORE_MIN
521
+ return self._clamp(value, _SCORE_MIN, _SCORE_MAX)
522
+
523
+ completed = self._metrics["completed"]
524
+ arrivals = self._metrics["arrivals"]
525
+ rejected = self._metrics["rejected"]
526
+ avg_wait = self._safe_div(self._metrics["wait_sum"], self._metrics["wait_count"])
527
+ rejection_rate = self._safe_div(rejected, arrivals)
528
+ sla_rate = self._safe_div(self._metrics["sla_breaches"], max(1.0, completed))
529
+ throughput = completed
530
+ fairness_gap = self._safe_div(self._metrics["fairness_gap_sum"], self._metrics["fairness_gap_count"])
531
+
532
+ if cfg.task_id == "easy":
533
+ score_wait = c01(1.0 - avg_wait / cfg.score_refs["wait"])
534
+ score_thr = c01(throughput / cfg.score_refs["thr"])
535
+ score_rej = c01(1.0 - rejection_rate / cfg.score_refs["rej"])
536
+ score_sla = c01(1.0 - sla_rate / cfg.score_refs["sla"])
537
+ score = 0.4 * score_wait + 0.3 * score_thr + 0.15 * score_rej + 0.15 * score_sla
538
+ details = {
539
+ "score_wait": round(score_wait, 4),
540
+ "score_throughput": round(score_thr, 4),
541
+ "score_rejection": round(score_rej, 4),
542
+ "score_sla": round(score_sla, 4),
543
+ }
544
+ elif cfg.task_id == "medium":
545
+ p95_u = self._percentile(self._wait_samples_urgent, 0.95)
546
+ p95_n = self._percentile(self._wait_samples_normal, 0.95)
547
+ urgent_sla = self._safe_div(self._metrics["sla_breaches_urgent"], max(1.0, self._metrics["completed_urgent"]))
548
+ s_uw = c01(1.0 - p95_u / cfg.score_refs["uw"])
549
+ s_nw = c01(1.0 - p95_n / cfg.score_refs["nw"])
550
+ s_usla = c01(1.0 - urgent_sla / cfg.score_refs["usla"])
551
+ s_thr = c01(throughput / cfg.score_refs["thr"])
552
+ s_cost = c01(1.0 - self._metrics["action_cost"] / cfg.score_refs["cost"])
553
+ score = 0.35 * s_uw + 0.15 * s_nw + 0.25 * s_usla + 0.15 * s_thr + 0.10 * s_cost
554
+ details = {
555
+ "score_urgent_wait": round(s_uw, 4),
556
+ "score_normal_wait": round(s_nw, 4),
557
+ "score_urgent_sla": round(s_usla, 4),
558
+ "score_throughput": round(s_thr, 4),
559
+ "score_cost": round(s_cost, 4),
560
+ }
561
+ else:
562
+ e2e_p95 = self._percentile(self._e2e_wait_samples, 0.95)
563
+ abd_rate = self._safe_div(self._metrics["abandoned"], arrivals)
564
+ s_e2e = c01(1.0 - e2e_p95 / cfg.score_refs["e2e"])
565
+ s_abd = c01(1.0 - abd_rate / cfg.score_refs["abd"])
566
+ s_sla = c01(1.0 - sla_rate / cfg.score_refs["sla"])
567
+ s_thr = c01(throughput / cfg.score_refs["thr"])
568
+ s_cost = c01(1.0 - self._metrics["infra_cost"] / cfg.score_refs["cost"])
569
+ s_fair = c01(1.0 - fairness_gap / cfg.score_refs["fair"])
570
+ score = 0.25 * s_e2e + 0.20 * s_abd + 0.20 * s_sla + 0.15 * s_thr + 0.10 * s_cost + 0.10 * s_fair
571
+ details = {
572
+ "score_e2e_p95": round(s_e2e, 4),
573
+ "score_abandonment": round(s_abd, 4),
574
+ "score_sla": round(s_sla, 4),
575
+ "score_throughput": round(s_thr, 4),
576
+ "score_cost": round(s_cost, 4),
577
+ "score_fairness": round(s_fair, 4),
578
+ }
579
+
580
+ if self._metrics["invalid_actions"] > max(3.0, 0.04 * cfg.horizon):
581
+ score = min(score, 0.4)
582
+ # Apply strict open-interval clamp: validator rejects 0.0 and 1.0.
583
+ return strict01(score), details
584
+
585
+ def _compute_action_mask(self, cfg: TaskConfig) -> list[int]:
586
+ """Compute which of the 8 actions are valid right now.
587
+
588
+ Slot order (matches CloudQueueAction.action_type):
589
+ 0: configure_task — always valid (meta, sets next task/seed)
590
+ 1: admit — only if an incoming job is waiting
591
+ 2: reject — only if an incoming job is waiting
592
+ 3: route — only if an incoming job is waiting
593
+ 4: dispatch — only if an idle+active server AND a non-empty queue exist
594
+ 5: scale — only if cfg.allow_scaling is True
595
+ 6: reprioritize — only if cfg.allow_priority AND a normal-priority job is queued
596
+ 7: noop — always valid
597
+ """
598
+ has_incoming = self._incoming_job is not None
599
+
600
+ has_idle_server = any(
601
+ s["active"] and s["job"] is None for s in self._servers
602
+ )
603
+ has_queued_job = any(len(q) > 0 for q in self._queues)
604
+ can_dispatch = 1 if (has_idle_server and has_queued_job) else 0
605
+
606
+ can_reprioritize = 0
607
+ if cfg.allow_priority:
608
+ can_reprioritize = 1 if any(
609
+ job["priority"] == 1 for q in self._queues for job in q
610
+ ) else 0
611
+
612
+ return [
613
+ 1, # 0: configure_task
614
+ 1 if has_incoming else 0, # 1: admit
615
+ 1 if has_incoming else 0, # 2: reject
616
+ 1 if has_incoming else 0, # 3: route
617
+ can_dispatch, # 4: dispatch
618
+ 1 if cfg.allow_scaling else 0, # 5: scale
619
+ can_reprioritize, # 6: reprioritize
620
+ 1, # 7: noop
621
+ ]
622
+
623
+ def _build_observation(self, reward: float, done: bool, info: dict) -> CloudQueueObservation:
624
+ cfg = self._task_configs[self._active_task_id]
625
+ queue_lengths = [len(q) for q in self._queues]
626
+ for i, q in enumerate(self._queues):
627
+ current_mean_wait = 0.0
628
+ if q:
629
+ current_mean_wait = sum(job["wait"] for job in q) / len(q)
630
+ self._wait_ema[i] = 0.8 * self._wait_ema[i] + 0.2 * current_mean_wait
631
+
632
+ active_servers = max(1, sum(1 for s in self._servers if s["active"]))
633
+ completed = max(1.0, self._metrics["completed"])
634
+ sla_violation_rate = self._safe_div(self._metrics["sla_breaches"], completed)
635
+ abandonment_rate = self._safe_div(self._metrics["abandoned"], max(1.0, self._metrics["arrivals"]))
636
+ throughput_recent = max(0.0, info.get("completed_this_step", 0.0))
637
+ energy_cost_rate = active_servers * cfg.server_cost
638
+
639
+ incoming = self._incoming_job
640
+ incoming_present = incoming is not None
641
+ incoming_size = float(incoming["size"]) if incoming_present else 0.0
642
+ incoming_priority = int(incoming["priority"]) if incoming_present else 0
643
+ incoming_deadline = float(incoming["deadline"]) if incoming_present else 0.0
644
+ incoming_type = int(incoming["type"]) if incoming_present else 0
645
+
646
+ score, score_details = (0.0, {})
647
+ if done:
648
+ score, score_details = self._score_task(cfg)
649
+
650
+ metadata = {
651
+ "info": info,
652
+ "reward_components": info.get("reward_components", {}),
653
+ "applied_action": info.get("applied_action", "noop"),
654
+ "seed": int(self._pending_seed),
655
+ "trace_digest": self._trace_digest(),
656
+ "rng_stream_seeds": self._rng_stream_seeds,
657
+ "metrics": {
658
+ "arrivals": self._metrics["arrivals"],
659
+ "accepted": self._metrics["accepted"],
660
+ "rejected": self._metrics["rejected"],
661
+ "completed": self._metrics["completed"],
662
+ "abandoned": self._metrics["abandoned"],
663
+ "invalid_actions": self._metrics["invalid_actions"],
664
+ "harmful_scale_down": self._metrics["harmful_scale_down"],
665
+ "infra_cost": round(self._metrics["infra_cost"], 4),
666
+ },
667
+ "episode_score": round(score, 4),
668
+ "score_details": score_details,
669
+ }
670
+
671
+ return CloudQueueObservation(
672
+ task_id=cfg.task_id,
673
+ sim_time=self._sim_time,
674
+ horizon=cfg.horizon,
675
+ queue_lengths=queue_lengths,
676
+ queue_wait_ema=[round(v, 3) for v in self._wait_ema],
677
+ server_busy=[1 if s["job"] is not None and s["active"] else 0 for s in self._servers],
678
+ server_remaining_service=[round(float(s["remaining"]), 3) for s in self._servers],
679
+ utilization=[round(v, 3) for v in self._utilization_ema[: len(self._servers)]],
680
+ incoming_job_present=incoming_present,
681
+ incoming_job_size=round(incoming_size, 3),
682
+ incoming_job_priority=incoming_priority,
683
+ incoming_job_deadline=round(incoming_deadline, 3),
684
+ incoming_job_type=incoming_type,
685
+ sla_violation_rate=round(sla_violation_rate, 4),
686
+ abandonment_rate=round(abandonment_rate, 4),
687
+ throughput_recent=round(throughput_recent, 4),
688
+ energy_cost_rate=round(energy_cost_rate, 4),
689
+ level=cfg.level,
690
+ optional_history=[round(v, 4) for v in list(self._recent_rewards)],
691
+ action_mask=self._compute_action_mask(cfg),
692
+ done=done,
693
+ reward=round(reward, 6),
694
+ metadata=metadata,
695
+ )
696
+
697
+ def step(self, action: CloudQueueAction) -> CloudQueueObservation: # type: ignore[override]
698
+ cfg = self._task_configs[self._active_task_id]
699
+
700
+ if (action.action_type or "").lower() == "configure_task":
701
+ ok, note = self._apply_action(action, cfg)
702
+ info = {
703
+ "event": "configure_task",
704
+ "applied_action": action.action_type,
705
+ "valid_action": ok,
706
+ "note": note,
707
+ "completed_this_step": 0.0,
708
+ "debug_trace_id": self._trace_digest(),
709
+ }
710
+ return self._build_observation(reward=0.0, done=self._done, info=info)
711
+
712
+ if self._done:
713
+ info = {
714
+ "event": "episode_done",
715
+ "applied_action": action.action_type,
716
+ "valid_action": False,
717
+ "note": "call reset() to start a new episode",
718
+ "completed_this_step": 0.0,
719
+ "reward_components": {},
720
+ "debug_trace_id": self._trace_digest(),
721
+ }
722
+ return self._build_observation(reward=0.0, done=True, info=info)
723
+
724
+ self._state.step_count += 1
725
+ self._sim_time += 1
726
+
727
+ completed_this_step = self._process_servers(cfg)
728
+ abandoned_this_step = self._update_wait_and_abandonment(cfg)
729
+ self._spawn_incoming_job(cfg)
730
+
731
+ action_ok, action_note = self._apply_action(action, cfg)
732
+ action_key = (
733
+ f"{(action.action_type or 'noop').lower()}|"
734
+ f"q={action.target_queue}|s={action.target_server}|"
735
+ f"d={action.scale_delta}|p={action.new_priority}"
736
+ )
737
+ self._action_trace.append(action_key)
738
+ self._autodispatch()
739
+ reward, reward_components = self._compute_reward(
740
+ cfg,
741
+ action_ok=action_ok,
742
+ action_type=(action.action_type or "noop").lower(),
743
+ action_scale_delta=int(action.scale_delta or 0),
744
+ completed_step=completed_this_step,
745
+ )
746
+
747
+ self._done = self._state.step_count >= cfg.horizon
748
+ info = {
749
+ "event": "step",
750
+ "applied_action": action.action_type,
751
+ "valid_action": action_ok,
752
+ "note": action_note,
753
+ "completed_this_step": completed_this_step,
754
+ "abandoned_this_step": abandoned_this_step,
755
+ "reward_components": reward_components,
756
+ "debug_trace_id": self._trace_digest(),
757
+ }
758
+ return self._build_observation(reward=reward, done=self._done, info=info)
759
+
760
+ @property
761
+ def state(self) -> State:
762
+ return self._state
server/requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ openenv[core]>=0.2.0
2
+ fastapi>=0.115.0
3
+ uvicorn>=0.24.0
4
+
5
+
6
+
uv.lock ADDED
The diff for this file is too large to render. See raw diff