taarunforge commited on
Commit
dfbb493
·
1 Parent(s): bd03bab

Deploy SpectraQual OpenEnv environment

Browse files
Dockerfile ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ── Base image ───────────────────────────────────────────────────────────────
2
+ FROM python:3.11-slim
3
+
4
+ # ── Metadata ─────────────────────────────────────────────────────────────────
5
+ LABEL maintainer="SpectraQual Team"
6
+ LABEL description="SpectraQual — PCB Quality Control OpenEnv Environment"
7
+ LABEL version="1.0.0"
8
+
9
+ # ── System deps ───────────────────────────────────────────────────────────────
10
+ RUN apt-get update && apt-get install -y --no-install-recommends \
11
+ curl \
12
+ && rm -rf /var/lib/apt/lists/*
13
+
14
+ # ── Working directory ─────────────────────────────────────────────────────────
15
+ WORKDIR /app
16
+
17
+ # ── Install Python dependencies first (layer cache) ──────────────────────────
18
+ COPY requirements.txt .
19
+ RUN pip install --no-cache-dir -r requirements.txt
20
+
21
+ # ── Copy source code ──────────────────────────────────────────────────────────
22
+ COPY . .
23
+
24
+ # ── Environment variables (overridden at runtime) ─────────────────────────────
25
+ ENV API_BASE_URL="https://openrouter.ai/api/v1"
26
+ ENV MODEL_NAME="meta-llama/llama-3.3-70b-instruct"
27
+ ENV HF_TOKEN=""
28
+
29
+ # ── Expose API port (HF Spaces default) ───────────────────────────────────────
30
+ EXPOSE 7860
31
+
32
+ # ── Health check ──────────────────────────────────────────────────────────────
33
+ HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
34
+ CMD curl -f http://localhost:7860/ || exit 1
35
+
36
+ # ── Default command: launch FastAPI server ───────────────────────────────
37
+ CMD ["uvicorn", "src.api:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,12 +1,235 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: Spectraqual
3
- emoji: 😻
4
- colorFrom: pink
5
- colorTo: indigo
6
- sdk: docker
7
- pinned: false
8
- license: mit
9
- short_description: "PCB quality-control triage OpenEnv environment\t"
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SpectraQual — PCB Smart Quality-Control OpenEnv Environment
2
+
3
+ [![OpenEnv](https://img.shields.io/badge/OpenEnv-Compliant-00e5ff?style=flat-square)](https://github.com/openenv)
4
+ [![Python](https://img.shields.io/badge/Python-3.11-blue?style=flat-square)](https://python.org)
5
+ [![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE)
6
+
7
+ **SpectraQual** is a real-world AI environment that simulates smart quality-control triage for Printed Circuit Boards (PCBs) in a manufacturing factory.
8
+
9
+ An AI agent receives a stream of PCBs, each with a different defect type, component cost, and criticality score. The agent must choose the optimal economic action (Pass, Scrap, Route to Repair, Wait) while managing a shared factory soldering slot queue.
10
+
11
+ > **Why this problem matters:** PCB triage is a real, high-stakes manufacturing task. Wrong decisions mean wasted boards, bottlenecked production lines, and downstream electronics failures. SpectraQual models this as an RL environment where an agent must balance economic value, operational constraints, and risk — a setting where LLM agents can be meaningfully evaluated.
12
+
13
+ ---
14
+
15
+ ## Environment Overview
16
+
17
+ | Property | Value |
18
+ |---|---|
19
+ | **Domain** | Smart Manufacturing / Industrial AI |
20
+ | **Tasks** | 3 (Easy → Hard) |
21
+ | **Action Space** | 6 discrete actions |
22
+ | **Observation Space** | 13 fields (typed Pydantic model) |
23
+ | **Reward Range** | `[0.0, 1.0]` normalized |
24
+ | **Reward Signal** | Dense (per-step), 5 components |
25
+ | **Seeded / Reproducible** | Yes |
26
+ | **Anomaly Detection** | Yes |
27
+ | **OpenEnv Spec** | Compliant |
28
+
29
  ---
30
+
31
+ ## Action Space
32
+
33
+ | Action | Description | Valid When |
34
+ |---|---|---|
35
+ | `PASS` | Clear the board — no defect | `defect_type = none` |
36
+ | `SCRAP` | Discard the board | Any defect |
37
+ | `ROUTE_COMPONENT_REPLACEMENT` | Send to component repair | `missing_component` |
38
+ | `ROUTE_SOLDERING` | Send to soldering station | `solder_bridge` |
39
+ | `ROUTE_DIAGNOSTICS` | Send for investigation | `short_circuit` |
40
+ | `WAIT` | Hold board until slot free | `solder_bridge` (no slot) |
41
+
42
  ---
43
 
44
+ ## Observation Space
45
+
46
+ ```python
47
+ class PCBObservation(BaseModel):
48
+ board_id: str # Unique PCB ID (e.g. "SQ-4321")
49
+ defect_type: str # "none" | "missing_component" | "solder_bridge" | "short_circuit"
50
+ component_cost: float # Replacement cost ₹10–200
51
+ criticality: float # Risk score 0.1–1.0
52
+ slots_free: int # Available soldering slots
53
+ slots_state: List[int] # Remaining time per slot (0=free, -1=locked)
54
+ is_anomaly: bool # True if board is rare/extreme
55
+ anomaly_score: float # Anomaly confidence 0.0–1.0
56
+ valid_actions: List[str] # Permitted actions for this defect
57
+ rolling_accuracy: float # Fraction of correct decisions so far
58
+ throughput: float # Boards/step so far
59
+ cumulative_reward: float # Episode cumulative reward
60
+ step: int # Current step number
61
+ ```
62
+
63
+ ---
64
+
65
+ ## Reward Function
66
+
67
+ Reward is **dense** (given every step) and **decomposed into 5 interpretable components**, all normalized to `[0.0, 1.0]`:
68
+
69
+ | Component | Weight | Description |
70
+ |---|---|---|
71
+ | `defect_reward` | 35% | Correctness of the action for the defect type |
72
+ | `cost_efficiency` | 25% | Economic value retained vs. lost |
73
+ | `queue_penalty` | 20% | Factory bottleneck avoidance |
74
+ | `criticality_factor` | 10% | Risk-adjusted multiplier |
75
+ | `anomaly_bonus` | 10% | Correct handling of anomalous boards |
76
+
77
+ **Final reward** = weighted sum of all 5 components, clamped to `[0.0, 1.0]`.
78
+
79
+ Every `StepResult` includes a full `RewardComponents` object with an `explanation` field explaining why the reward was given — enabling full explainability.
80
+
81
+ ---
82
+
83
+ ## Tasks
84
+
85
+ ### Task Easy (`task_easy`)
86
+ - **Boards:** 10 | **Seed:** 42 | **Slots:** 3 | **Anomaly Rate:** 0%
87
+ - **Objective:** Correctly classify all defect types. No slot pressure.
88
+ - **Grader:** `0.70 × accuracy + 0.30 × avg_reward`
89
+ - **Expected frontier model score:** ≥ 0.85
90
+
91
+ ### Task Medium (`task_medium`)
92
+ - **Boards:** 15 | **Seed:** 99 | **Slots:** 1 | **Anomaly Rate:** 10%
93
+ - **Objective:** Triage boards with one soldering slot — manage queue pressure.
94
+ - **Grader:** `0.60 × economic_efficiency + 0.40 × bottleneck_avoidance`
95
+ - **Expected frontier model score:** ≥ 0.65
96
+
97
+ ### Task Hard (`task_hard`)
98
+ - **Boards:** 20 | **Seed:** 777 | **Slots:** 1 | **Anomaly Rate:** 25%
99
+ - **Objective:** Handle anomalous boards safely AND maintain throughput with tight slots.
100
+ - **Grader:** `0.50 × anomaly_score + 0.30 × economic_score + 0.20 × throughput_score`
101
+ - **Expected frontier model score:** ≥ 0.50
102
+
103
+ ---
104
+
105
+ ## Setup & Usage
106
+
107
+ ### Prerequisites
108
+
109
+ ```bash
110
+ Python >= 3.11
111
+ pip install -r requirements.txt
112
+ ```
113
+
114
+ ### 1) Launch the Streamlit Dashboard
115
+
116
+ ```bash
117
+ streamlit run src/app.py
118
+ ```
119
+
120
+ ### 2) Run the LLM Inference Script
121
+
122
+ ```bash
123
+ # Set environment variables
124
+ export API_BASE_URL="https://openrouter.ai/api/v1"
125
+ export MODEL_NAME="meta-llama/llama-3.3-70b-instruct"
126
+ export HF_TOKEN="hf_your_key_here"
127
+
128
+ # Run baseline inference
129
+ python inference.py
130
+ ```
131
+
132
+ ### 3) Run Task Grader Sanity Check
133
+
134
+ ```bash
135
+ cd src
136
+ python tasks.py
137
+ ```
138
+
139
+ ### 4) Train the Q-learning Agent
140
+
141
+ ```bash
142
+ python src/train.py
143
+ ```
144
+
145
+ ### 5) Run CLI Simulation (rule-based)
146
+
147
+ ```bash
148
+ python src/main.py
149
+ ```
150
+
151
+ ---
152
+
153
+ ## Docker
154
+
155
+ ```bash
156
+ # Build
157
+ docker build -t spectraqual .
158
+
159
+ # Run the API server (default — what HF Spaces runs)
160
+ # Exposes: GET / | POST /reset | POST /step | GET /state
161
+ docker run -p 7860:7860 spectraqual
162
+ # → API docs available at http://localhost:7860/docs
163
+
164
+ # Run inference inside container
165
+ docker run \
166
+ -e API_BASE_URL="https://openrouter.ai/api/v1" \
167
+ -e MODEL_NAME="meta-llama/llama-3.3-70b-instruct" \
168
+ -e HF_TOKEN="hf_..." \
169
+ --entrypoint python spectraqual inference.py
170
+
171
+ # Run Streamlit dashboard locally (NOT the Docker default — local dev only)
172
+ streamlit run src/app.py --server.port 8501
173
+ ```
174
+
175
+ ---
176
+
177
+ ## Project Structure
178
+
179
+ ```
180
+ spectraqual/
181
+ ├── inference.py # Root LLM baseline script (required by OpenEnv)
182
+ ├── openenv.yaml # OpenEnv spec metadata
183
+ ├── Dockerfile # Container definition
184
+ ├── requirements.txt # Pinned dependencies
185
+ ├── README.md # This file
186
+ └── src/
187
+ ├── config.py # All constants, task configs, reward weights
188
+ ├── models.py # Pydantic typed models (Observation, Action, Reward)
189
+ ├── env.py # SpectraQualEnv class (reset/step/state + legacy wrappers)
190
+ ├── reward.py # Multi-component normalized reward calculator
191
+ ├── tasks.py # 3 tasks + programmatic graders
192
+ ├── agent.py # Q-learning agent (baseline model zoo)
193
+ ├── app.py # Streamlit dashboard
194
+ ├── train.py # Offline Q-table trainer
195
+ └── main.py # Rule-based CLI runner
196
+ ```
197
+
198
+ ---
199
+
200
+ ## Baseline Scores
201
+
202
+ | Agent | task_easy | task_medium | task_hard |
203
+ |---|---|---|---|
204
+ | Rule-based | ~0.82 | ~0.61 | ~0.48 |
205
+ | LLM (llama-3.3-70b) | TBD | TBD | TBD |
206
+ | Q-learning (trained) | TBD | TBD | TBD |
207
+
208
+ ---
209
+
210
+ ## Research Extensions
211
+
212
+ The environment supports:
213
+ - **Anomaly detection mode**: boards with extreme cost+criticality are flagged
214
+ - **Seeded reproducibility**: every task uses a fixed RNG seed
215
+ - **Pluggable agents**: any agent implementing `predict(observation) → action`
216
+ - **Dense reward signal**: sub-rewards for debugging and ablation studies
217
+ - **Explainability**: every step reward comes with a natural-language explanation
218
+ - **Benchmark modes**: noisy observations, partial observability (planned)
219
+
220
+ ---
221
+
222
+ ## Environment Variables for Inference
223
+
224
+ | Variable | Required | Default | Description |
225
+ |---|---|---|---|
226
+ | `API_BASE_URL` | No | `https://openrouter.ai/api/v1` | LLM API endpoint |
227
+ | `MODEL_NAME` | No | `meta-llama/llama-3.3-70b-instruct` | Model identifier |
228
+ | `HF_TOKEN` | Yes (prod) | — | Hugging Face / API key |
229
+ | `LOCAL_IMAGE_NAME` | No | — | Docker image (for containerized env) |
230
+
231
+ ---
232
+
233
+ ## License
234
+
235
+ MIT License — see [LICENSE](LICENSE).
inference.py ADDED
@@ -0,0 +1,293 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ inference.py — SpectraQual OpenEnv Baseline Inference Script
3
+
4
+ Runs an LLM agent against all 3 SpectraQual tasks and emits structured logs.
5
+
6
+ Environment variables (set before running):
7
+ API_BASE_URL The LLM API endpoint (default: https://openrouter.ai/api/v1)
8
+ MODEL_NAME Model identifier (default: meta-llama/llama-3.3-70b-instruct)
9
+ HF_TOKEN Your Hugging Face / API key (required in production)
10
+
11
+ Usage:
12
+ export HF_TOKEN="hf_xxx..."
13
+ python inference.py
14
+
15
+ Output format:
16
+ [START] task=<id> env=SpectraQual model=<model>
17
+ [STEP] step=<n> action=<A> reward=<r> done=<bool> error=<null|msg>
18
+ [END] success=<bool> steps=<n> score=<f> rewards=[...]
19
+ """
20
+
21
+ from __future__ import annotations
22
+ import json
23
+ import os
24
+ import sys
25
+ import time
26
+ from typing import List, Optional
27
+
28
+ # ── Path setup so we can import from src/ ──────────────────────────────────
29
+ ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
30
+ SRC_DIR = os.path.join(ROOT_DIR, "src")
31
+ sys.path.insert(0, SRC_DIR)
32
+
33
+ from openai import OpenAI
34
+ from env import SpectraQualEnv
35
+ from models import PCBAction, StepResult
36
+ from config import (
37
+ ACTIONS,
38
+ VALID_ACTIONS,
39
+ MAX_STEPS_PER_TASK,
40
+ SUCCESS_SCORE_THRESHOLD,
41
+ TEMPERATURE,
42
+ MAX_TOKENS,
43
+ TASKS,
44
+ )
45
+ from tasks import TASK_DESCRIPTIONS, run_task, grade
46
+
47
+ # ── Environment variables ──────────────────────────────────────────────────
48
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
49
+ MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/llama-3.3-70b-instruct")
50
+ HF_TOKEN = os.getenv("HF_TOKEN")
51
+ API_KEY = HF_TOKEN or os.getenv("OPENAI_API_KEY", "no-key-set")
52
+
53
+ # Optional: if you use from_docker_image() style containerized env
54
+ LOCAL_IMAGE_NAME = os.getenv("LOCAL_IMAGE_NAME")
55
+
56
+ BENCHMARK = "SpectraQual"
57
+ TASK_IDS = ["task_easy", "task_medium", "task_hard"]
58
+
59
+ # ── System prompt for the LLM ──────────────────────────────────────────────
60
+ SYSTEM_PROMPT = """You are a PCB quality-control triage agent.
61
+ You will receive information about a printed circuit board (PCB) including its defect type,
62
+ component cost, criticality score, and available factory soldering slots.
63
+
64
+ You must choose exactly ONE action from the allowed list.
65
+ Respond with ONLY the action name — no explanation, no extra text, no punctuation.
66
+
67
+ Action meanings:
68
+ - PASS → Board has no defect; clear it.
69
+ - SCRAP → Board is too damaged or high-risk; discard it.
70
+ - ROUTE_COMPONENT_REPLACEMENT → Board has a missing component; route to repair.
71
+ - ROUTE_SOLDERING → Board has a solder bridge; send to soldering station.
72
+ - ROUTE_DIAGNOSTICS → Board has an ambiguous fault; send for investigation.
73
+ - WAIT → No soldering slot available; hold the board.
74
+
75
+ Rules:
76
+ - For defect_type=none, you MUST respond PASS.
77
+ - For defect_type=missing_component, choose ROUTE_COMPONENT_REPLACEMENT or SCRAP.
78
+ - For defect_type=solder_bridge, choose ROUTE_SOLDERING, WAIT, or SCRAP.
79
+ - For defect_type=short_circuit, choose SCRAP or ROUTE_DIAGNOSTICS.
80
+ - If slots_free=0 and action=ROUTE_SOLDERING would apply, prefer WAIT instead.
81
+
82
+ Respond with only one word. Example: ROUTE_SOLDERING"""
83
+
84
+
85
+ # ── Prompt builder ─────────────────────────────────────────────────────────
86
+ def build_user_prompt(
87
+ obs,
88
+ step: int,
89
+ last_reward: float,
90
+ history: List[str],
91
+ ) -> str:
92
+ history_txt = "\n".join(history[-5:]) if history else "None"
93
+ anomaly_txt = f"⚠️ ANOMALY DETECTED (score={obs.anomaly_score:.2f})" if obs.is_anomaly else "Normal"
94
+ return f"""=== PCB TRIAGE — Step {step} ===
95
+ Board ID: {obs.board_id}
96
+ Defect Type: {obs.defect_type}
97
+ Component Cost: ₹{obs.component_cost:.2f}
98
+ Criticality: {obs.criticality:.2f}
99
+ Slots Free: {obs.slots_free} / {len(obs.slots_state)}
100
+ Slot State: {obs.slots_state}
101
+ Anomaly: {anomaly_txt}
102
+
103
+ Valid Actions: {", ".join(obs.valid_actions)}
104
+
105
+ Last Reward: {last_reward:.4f}
106
+ Cumulative: {obs.cumulative_reward:.4f}
107
+ Accuracy: {obs.rolling_accuracy:.2%}
108
+
109
+ Recent History:
110
+ {history_txt}
111
+
112
+ Choose exactly one action from: {", ".join(obs.valid_actions)}"""
113
+
114
+
115
+ # ── Structured log helpers ─────────────────────────────────────────────────
116
+ def log_start(task: str, env: str, model: str) -> None:
117
+ print(
118
+ f"[START] task={task} env={env} model={model}",
119
+ flush=True,
120
+ )
121
+
122
+
123
+ def log_step(
124
+ step: int,
125
+ action: str,
126
+ reward: float,
127
+ done: bool,
128
+ error: Optional[str],
129
+ ) -> None:
130
+ error_val = "null" if error is None else f'"{error}"'
131
+ print(
132
+ f"[STEP] step={step} action={action} reward={reward:.4f} done={done} error={error_val}",
133
+ flush=True,
134
+ )
135
+
136
+
137
+ def log_end(
138
+ success: bool,
139
+ steps: int,
140
+ score: float,
141
+ rewards: List[float],
142
+ ) -> None:
143
+ rewards_str = json.dumps([round(r, 4) for r in rewards])
144
+ print(
145
+ f"[END] success={success} steps={steps} score={score:.4f} rewards={rewards_str}",
146
+ flush=True,
147
+ )
148
+
149
+
150
+ # ── LLM call ──────────────────────────────────────────────────────────────
151
+ def get_llm_action(
152
+ client: OpenAI,
153
+ obs,
154
+ step: int,
155
+ last_reward: float,
156
+ history: List[str],
157
+ ) -> str:
158
+ """Ask the LLM for a triage action. Falls back to SCRAP on any error."""
159
+ prompt = build_user_prompt(obs, step, last_reward, history)
160
+ try:
161
+ completion = client.chat.completions.create(
162
+ model=MODEL_NAME,
163
+ messages=[
164
+ {"role": "system", "content": SYSTEM_PROMPT},
165
+ {"role": "user", "content": prompt},
166
+ ],
167
+ temperature=TEMPERATURE,
168
+ max_tokens=MAX_TOKENS,
169
+ stream=False,
170
+ )
171
+ raw = (completion.choices[0].message.content or "").strip().upper()
172
+
173
+ # Validate: pick first word that matches a known action
174
+ for candidate in raw.split():
175
+ candidate = candidate.strip(".,;:!?\"'")
176
+ if candidate in ACTIONS:
177
+ return candidate
178
+
179
+ # Fallback: try to find partial match
180
+ for action in ACTIONS:
181
+ if action in raw:
182
+ return action
183
+
184
+ print(f"[DEBUG] Unexpected model output: {raw!r}", flush=True)
185
+ return "SCRAP"
186
+
187
+ except Exception as exc:
188
+ print(f"[DEBUG] LLM request failed: {exc}", flush=True)
189
+ return "SCRAP"
190
+
191
+
192
+ # ── Single task runner ─────────────────────────────────────────────────────
193
+ def run_task_inference(client: OpenAI, task_id: str) -> tuple[bool, int, float, List[float]]:
194
+ """
195
+ Run the LLM agent against one task.
196
+ Returns (success, steps_taken, score, rewards_list).
197
+ """
198
+ cfg = TASKS[task_id]
199
+ max_steps = min(cfg["n_boards"] + 5, MAX_STEPS_PER_TASK)
200
+ total_reward_cap = cfg["n_boards"] * 1.0 # max possible (1.0 per step)
201
+
202
+ env = SpectraQualEnv(task_id=task_id)
203
+ history: List[str] = []
204
+ rewards: List[float] = []
205
+ action_log: List[str] = []
206
+ steps_taken = 0
207
+ score = 0.0
208
+ success = False
209
+
210
+ log_start(task=task_id, env=BENCHMARK, model=MODEL_NAME)
211
+
212
+ try:
213
+ result = env.reset()
214
+ obs = result.observation
215
+ last_reward = 0.0
216
+
217
+ for step in range(1, max_steps + 1):
218
+ if result.done:
219
+ break
220
+
221
+ # Get action from LLM
222
+ action_str = get_llm_action(client, obs, step, last_reward, history)
223
+ action_log.append(action_str)
224
+
225
+ error = None
226
+ try:
227
+ result = env.step(PCBAction(action=action_str))
228
+ except Exception as e:
229
+ error = str(e)
230
+ result = env.step(PCBAction(action="SCRAP"))
231
+
232
+ obs = result.observation
233
+ reward = result.reward
234
+ done = result.done
235
+ last_reward = reward
236
+
237
+ rewards.append(reward)
238
+ steps_taken = step
239
+
240
+ log_step(step=step, action=action_str, reward=reward, done=done, error=error)
241
+
242
+ history.append(
243
+ f"Step {step}: {action_str!r} → reward={reward:.4f}"
244
+ )
245
+
246
+ if done:
247
+ break
248
+
249
+ # Score = average normalized reward across all steps
250
+ score = sum(rewards) / max(len(rewards), 1)
251
+ score = min(max(score, 0.0), 1.0)
252
+ success = score >= SUCCESS_SCORE_THRESHOLD
253
+
254
+ except Exception as exc:
255
+ print(f"[DEBUG] Task runner error: {exc}", flush=True)
256
+
257
+ finally:
258
+ log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
259
+
260
+ return success, steps_taken, score, rewards
261
+
262
+
263
+ # ── Main ──────────────────────────────────────────────────────────────────
264
+ def main() -> None:
265
+ print(f"[DEBUG] API_BASE_URL = {API_BASE_URL}", flush=True)
266
+ print(f"[DEBUG] MODEL_NAME = {MODEL_NAME}", flush=True)
267
+ print(f"[DEBUG] HF_TOKEN = {'SET' if HF_TOKEN else 'NOT SET (using OPENAI_API_KEY fallback)'}", flush=True)
268
+ print("", flush=True)
269
+
270
+ client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
271
+
272
+ all_scores: List[float] = []
273
+
274
+ for task_id in TASK_IDS:
275
+ print(f"\n{'='*60}", flush=True)
276
+ print(f"[DEBUG] Starting {task_id} | {TASK_DESCRIPTIONS[task_id][:80]}...", flush=True)
277
+ print(f"{'='*60}\n", flush=True)
278
+
279
+ success, steps, score, rewards = run_task_inference(client, task_id)
280
+ all_scores.append(score)
281
+
282
+ print(f"\n[DEBUG] {task_id} complete — score={score:.4f} success={success}\n", flush=True)
283
+ time.sleep(1) # brief pause between tasks
284
+
285
+ overall = sum(all_scores) / len(all_scores) if all_scores else 0.0
286
+ print(f"\n{'='*60}", flush=True)
287
+ print(f"[SUMMARY] Overall score={overall:.4f}", flush=True)
288
+ print(f"[SUMMARY] Per-task: { {tid: round(s, 4) for tid, s in zip(TASK_IDS, all_scores)} }", flush=True)
289
+ print(f"{'='*60}\n", flush=True)
290
+
291
+
292
+ if __name__ == "__main__":
293
+ main()
openenv.yaml ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: spectraqual
2
+ version: "1.0.0"
3
+ description: >
4
+ SpectraQual is a smart PCB (Printed Circuit Board) quality-control triage
5
+ environment. An AI agent processes a stream of boards with randomized defects,
6
+ choosing the optimal economic and operational action under factory slot
7
+ constraints. Rewards are decomposed into 5 interpretable components and
8
+ normalized to [0.0, 1.0] for clean agent training.
9
+ author: "SpectraQual Team"
10
+ tags:
11
+ - pcb
12
+ - manufacturing
13
+ - quality-control
14
+ - industrial-ai
15
+ - real-world
16
+ - openenv
17
+
18
+ # Action space
19
+ actions:
20
+ - PASS
21
+ - SCRAP
22
+ - ROUTE_COMPONENT_REPLACEMENT
23
+ - ROUTE_SOLDERING
24
+ - ROUTE_DIAGNOSTICS
25
+ - WAIT
26
+
27
+ # Observation fields
28
+ observations:
29
+ - board_id: "Unique PCB identifier (string)"
30
+ - defect_type: "none | missing_component | solder_bridge | short_circuit"
31
+ - component_cost: "Replacement cost in ₹ (10.0–200.0)"
32
+ - criticality: "Risk score (0.1–1.0)"
33
+ - slots_free: "Available soldering slots (0–3)"
34
+ - slots_state: "Time remaining per slot (list of ints)"
35
+ - is_anomaly: "True if board has extreme cost+criticality"
36
+ - anomaly_score: "Anomaly confidence (0.0–1.0)"
37
+ - valid_actions: "Actions permitted for this defect type"
38
+ - rolling_accuracy: "Fraction of correct decisions so far"
39
+ - throughput: "Boards processed per step"
40
+ - cumulative_reward: "Episode cumulative normalized reward"
41
+
42
+ # Reward range
43
+ reward:
44
+ min: 0.0
45
+ max: 1.0
46
+ components:
47
+ - defect_reward: "Correctness of decision for defect type (weight=0.35)"
48
+ - cost_efficiency: "Economic value retained vs lost (weight=0.25)"
49
+ - queue_penalty: "Factory bottleneck avoidance (weight=0.20)"
50
+ - criticality_factor: "Risk-adjusted modifier (weight=0.10)"
51
+ - anomaly_bonus: "Correct handling of anomalous boards (weight=0.10)"
52
+
53
+ # Tasks
54
+ tasks:
55
+ - id: task_easy
56
+ description: "Triage 10 boards with no slot pressure. Seed=42."
57
+ difficulty: easy
58
+ n_boards: 10
59
+ seed: 42
60
+ n_slots: 3
61
+ anomaly_rate: 0.0
62
+ expected_score: 0.85
63
+
64
+ - id: task_medium
65
+ description: "Triage 15 boards with 1 soldering slot (queue pressure). Seed=99."
66
+ difficulty: medium
67
+ n_boards: 15
68
+ seed: 99
69
+ n_slots: 1
70
+ anomaly_rate: 0.10
71
+ expected_score: 0.65
72
+
73
+ - id: task_hard
74
+ description: "Triage 20 boards with 25% anomaly rate and tight slot constraints. Seed=777."
75
+ difficulty: hard
76
+ n_boards: 20
77
+ seed: 777
78
+ n_slots: 1
79
+ anomaly_rate: 0.25
80
+ expected_score: 0.50
81
+
82
+ # Interface compliance
83
+ interface:
84
+ reset: "Returns initial PCBObservation without reward"
85
+ step: "Takes PCBAction, returns StepResult (observation, reward, done, info)"
86
+ state: "Returns full internal environment state as dict"
87
+
88
+ # Deployment
89
+ deployment:
90
+ hf_space: "TAARUNEESHWARAN-027/spectraqual"
91
+ port: 7860
92
+ runtime: fastapi
93
+ api_docs: "/docs"
requirements.txt ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SpectraQual dependencies
2
+ # Pinned for reproducibility on vcpu=2, memory=8GB machines
3
+
4
+ # Core environment
5
+ pydantic>=2.0.0,<3.0.0
6
+
7
+ # Streamlit dashboard
8
+ streamlit>=1.32.0,<2.0.0
9
+
10
+ # Plotting
11
+ matplotlib>=3.8.0,<4.0.0
12
+
13
+ # LLM inference (OpenAI-compatible client)
14
+ openai>=1.0.0,<2.0.0
15
+
16
+ # HTTP client (used by openai SDK)
17
+ httpx>=0.25.0,<1.0.0
18
+
19
+ # API Endpoints
20
+ fastapi>=0.100.0,<1.0.0
21
+ uvicorn>=0.23.0,<1.0.0
src/__pycache__/agent.cpython-314.pyc ADDED
Binary file (2.53 kB). View file
 
src/__pycache__/app.cpython-314.pyc ADDED
Binary file (33.5 kB). View file
 
src/__pycache__/config.cpython-314.pyc ADDED
Binary file (2.26 kB). View file
 
src/__pycache__/env.cpython-314.pyc ADDED
Binary file (16.9 kB). View file
 
src/__pycache__/environment.cpython-314.pyc ADDED
Binary file (3.68 kB). View file
 
src/__pycache__/models.cpython-314.pyc ADDED
Binary file (6 kB). View file
 
src/__pycache__/reward.cpython-314.pyc ADDED
Binary file (12.5 kB). View file
 
src/__pycache__/tasks.cpython-314.pyc ADDED
Binary file (9.9 kB). View file
 
src/agent.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import random
2
+
3
+ # Q-table
4
+ Q = {}
5
+
6
+ # Actions
7
+ ACTIONS = [
8
+ "PASS",
9
+ "SCRAP",
10
+ "ROUTE_COMPONENT_REPLACEMENT",
11
+ "ROUTE_SOLDERING",
12
+ "ROUTE_DIAGNOSTICS",
13
+ "WAIT"
14
+ ]
15
+ def get_valid_actions(defect):
16
+ if defect == "none":
17
+ return ["PASS"]
18
+
19
+ if defect == "missing_component":
20
+ return ["ROUTE_COMPONENT_REPLACEMENT", "SCRAP"]
21
+
22
+ if defect == "solder_bridge":
23
+ return ["ROUTE_SOLDERING", "WAIT", "SCRAP"]
24
+
25
+ if defect == "short_circuit":
26
+ return ["SCRAP", "ROUTE_DIAGNOSTICS"]
27
+
28
+ return ["SCRAP"]
29
+
30
+ # Convert PCB → STATE
31
+ def get_state(pcb, factory):
32
+ slots_free = factory["soldering_slots"].count(0)
33
+
34
+ return (
35
+ pcb["defect_type"],
36
+ round(pcb["component_cost"] / 50), # bucket cost
37
+ round(pcb["criticality"], 1),
38
+ slots_free
39
+ )
40
+
41
+ # Initialize state
42
+ def init_state(state):
43
+ if state not in Q:
44
+ Q[state] = {a: 0 for a in ACTIONS}
45
+
46
+ # Epsilon-Greedy policy
47
+ def choose_action(state, epsilon=0.3):
48
+ init_state(state)
49
+
50
+ defect = state[0]
51
+ valid_actions = get_valid_actions(defect)
52
+
53
+ # Exploration
54
+ if random.random() < epsilon:
55
+ return random.choice(valid_actions)
56
+
57
+ # Exploitation (best action among valid ones)
58
+ return max(valid_actions, key=lambda a: Q[state][a])
59
+
60
+ # Q-learning update
61
+ def update_q(state, action, reward, next_state, alpha=0.1, gamma=0.9):
62
+ init_state(next_state)
63
+
64
+ old = Q[state][action]
65
+ future = max(Q[next_state].values())
66
+
67
+ Q[state][action] = old + alpha * (reward + gamma * future - old)
src/api.py ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, HTTPException
2
+ import sys
3
+ import os
4
+
5
+ # Add src to path so standard imports work
6
+ sys.path.insert(0, os.path.dirname(__file__))
7
+
8
+ from env import SpectraQualEnv
9
+ from models import PCBAction, StepResult
10
+
11
+ app = FastAPI(
12
+ title="SpectraQual OpenEnv API",
13
+ description="REST API for automated OpenEnv space evaluation",
14
+ version="1.0.0"
15
+ )
16
+
17
+ # Initialize a default environment instance
18
+ # In a real deployed evaluator, they may instantiate isolated environments
19
+ # but for the "ping space URL" test, a global instance is standard.
20
+ env_instance = SpectraQualEnv(task_id="task_easy")
21
+
22
+ @app.get("/")
23
+ def health_check():
24
+ """Returns 200 to pass automated ping test."""
25
+ return {"status": "ok", "environment": "SpectraQual"}
26
+
27
+ @app.post("/reset")
28
+ def reset_env() -> StepResult:
29
+ """Reset the environment and return initial observation."""
30
+ try:
31
+ return env_instance.reset()
32
+ except Exception as e:
33
+ raise HTTPException(status_code=500, detail=str(e))
34
+
35
+ @app.post("/step")
36
+ def step_env(action: PCBAction) -> StepResult:
37
+ """Take a step in the environment."""
38
+ try:
39
+ if env_instance.state()["done"]:
40
+ # If done, returning an error or auto-resetting depends on the logic.
41
+ # Best practice: raise 400 that episode is done.
42
+ raise HTTPException(status_code=400, detail="Episode is done. Call /reset first.")
43
+ return env_instance.step(action)
44
+ except Exception as e:
45
+ raise HTTPException(status_code=500, detail=str(e))
46
+
47
+ @app.get("/state")
48
+ def get_state():
49
+ """Return the internal state of the environment."""
50
+ try:
51
+ return env_instance.state()
52
+ except Exception as e:
53
+ raise HTTPException(status_code=500, detail=str(e))
54
+
55
+ if __name__ == "__main__":
56
+ import uvicorn
57
+ # Typically run via Docker CMD: uvicorn src.api:app --host 0.0.0.0 --port 7860
58
+ uvicorn.run(app, host="0.0.0.0", port=7860)
src/app.py ADDED
@@ -0,0 +1,674 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ app.py — SpectraQual Streamlit Dashboard (v3.0)
3
+ Updated to use the new SpectraQualEnv class with OpenEnv interface.
4
+ Features:
5
+ - Real-time stacked reward component charts
6
+ - Per-step accuracy / throughput display
7
+ - Action confidence from reward components
8
+ - Anomaly flag indicators
9
+ - Explainability: "Why this decision?"
10
+ """
11
+
12
+ import sys
13
+ import os
14
+ sys.path.insert(0, os.path.dirname(__file__))
15
+
16
+ import streamlit as st
17
+ import matplotlib.pyplot as plt
18
+ import time
19
+
20
+ from env import SpectraQualEnv
21
+ from models import PCBAction
22
+ from config import (
23
+ COLOR_PRIMARY, COLOR_SUCCESS, COLOR_WARNING,
24
+ COLOR_DANGER, COLOR_BG, COLOR_CARD, COLOR_MUTED,
25
+ TASKS,
26
+ )
27
+
28
+ # ---------------------------
29
+ # PAGE CONFIG
30
+ # ---------------------------
31
+ st.set_page_config(
32
+ page_title="SpectraQual",
33
+ page_icon="⚔️",
34
+ layout="wide",
35
+ initial_sidebar_state="collapsed",
36
+ )
37
+
38
+ # ---------------------------
39
+ # GLOBAL STYLES
40
+ # ---------------------------
41
+ st.markdown("""
42
+ <style>
43
+ @import url('https://fonts.googleapis.com/css2?family=Share+Tech+Mono&family=Rajdhani:wght@500;600;700&family=Exo+2:wght@300;400;600;800&display=swap');
44
+
45
+ .stApp {
46
+ background-color: #080c12;
47
+ color: #c9d4e0;
48
+ font-family: 'Exo 2', sans-serif;
49
+ }
50
+ .stApp::before {
51
+ content: '';
52
+ position: fixed;
53
+ inset: 0;
54
+ background: repeating-linear-gradient(0deg, rgba(0,0,0,0.025) 0px, rgba(0,0,0,0.025) 1px, transparent 1px, transparent 4px);
55
+ pointer-events: none;
56
+ z-index: 9999;
57
+ }
58
+ h1 {
59
+ font-family: 'Rajdhani', sans-serif !important;
60
+ font-weight: 700 !important;
61
+ font-size: 2.4rem !important;
62
+ letter-spacing: 0.12em !important;
63
+ color: #00e5ff !important;
64
+ text-shadow: 0 0 18px rgba(0,229,255,0.45), 0 0 40px rgba(0,229,255,0.12);
65
+ border-bottom: 1px solid rgba(0,229,255,0.15);
66
+ padding-bottom: 0.4rem;
67
+ }
68
+ h2, h3 {
69
+ font-family: 'Rajdhani', sans-serif !important;
70
+ font-weight: 600 !important;
71
+ font-size: 0.72rem !important;
72
+ letter-spacing: 0.14em !important;
73
+ color: #2e6a80 !important;
74
+ text-transform: uppercase;
75
+ margin-top: 1.4rem !important;
76
+ margin-bottom: 0.3rem !important;
77
+ }
78
+ [data-testid="metric-container"] {
79
+ background: linear-gradient(135deg, #0d1b2a, #09141f);
80
+ border: 1px solid rgba(0,229,255,0.15);
81
+ border-radius: 10px;
82
+ padding: 16px 20px !important;
83
+ box-shadow: 0 0 22px rgba(0,229,255,0.05), inset 0 1px 0 rgba(255,255,255,0.03);
84
+ transition: border-color 0.25s;
85
+ }
86
+ [data-testid="metric-container"]:hover { border-color: rgba(0,229,255,0.38); }
87
+ [data-testid="stMetricLabel"] {
88
+ font-family: 'Share Tech Mono', monospace !important;
89
+ font-size: 0.68rem !important;
90
+ color: #2e6a80 !important;
91
+ letter-spacing: 0.12em;
92
+ text-transform: uppercase;
93
+ }
94
+ [data-testid="stMetricValue"] {
95
+ font-family: 'Rajdhani', sans-serif !important;
96
+ font-size: 2.1rem !important;
97
+ font-weight: 700 !important;
98
+ color: #00e5ff !important;
99
+ }
100
+ .stButton > button {
101
+ background: linear-gradient(135deg, #0d2137, #091824);
102
+ color: #00e5ff;
103
+ border: 1px solid rgba(0,229,255,0.3);
104
+ border-radius: 6px;
105
+ font-family: 'Rajdhani', sans-serif;
106
+ font-weight: 600;
107
+ letter-spacing: 0.1em;
108
+ font-size: 0.85rem;
109
+ padding: 9px 18px;
110
+ text-transform: uppercase;
111
+ transition: all 0.2s;
112
+ box-shadow: 0 0 10px rgba(0,229,255,0.06);
113
+ width: 100%;
114
+ }
115
+ .stButton > button:hover {
116
+ background: linear-gradient(135deg, #123450, #0d2538);
117
+ border-color: #00e5ff;
118
+ box-shadow: 0 0 18px rgba(0,229,255,0.22);
119
+ transform: translateY(-1px);
120
+ }
121
+ .stButton > button:active { transform: translateY(0); }
122
+ .stSuccess, .stWarning, .stInfo, .stError {
123
+ border-radius: 8px !important;
124
+ font-family: 'Rajdhani', sans-serif !important;
125
+ font-size: 1.1rem !important;
126
+ font-weight: 600 !important;
127
+ letter-spacing: 0.05em;
128
+ border-left-width: 4px !important;
129
+ }
130
+ .stSuccess { background: rgba(0,230,118,0.07) !important; border-color: #00e676 !important; }
131
+ .stWarning { background: rgba(255,183,0,0.07) !important; border-color: #ffb700 !important; }
132
+ .stInfo { background: rgba(0,229,255,0.06) !important; border-color: #00e5ff !important; }
133
+ .stError { background: rgba(255,50,50,0.07) !important; border-color: #ff3232 !important; }
134
+ .pcb-card {
135
+ background: linear-gradient(135deg, #0d1b2a, #09141f);
136
+ border: 1px solid rgba(0,229,255,0.15);
137
+ border-radius: 10px;
138
+ padding: 18px 22px;
139
+ font-family: 'Share Tech Mono', monospace;
140
+ font-size: 0.82rem;
141
+ line-height: 2.1;
142
+ box-shadow: inset 0 0 24px rgba(0,0,0,0.25);
143
+ }
144
+ .lbl { color: #2e6a80; font-size: 0.68rem; letter-spacing: 0.12em; text-transform: uppercase; }
145
+ .val { color: #c9f0ff; font-weight: 600; }
146
+ .defect-badge {
147
+ display: inline-block;
148
+ padding: 1px 10px;
149
+ border-radius: 4px;
150
+ font-size: 0.72rem;
151
+ font-weight: 700;
152
+ letter-spacing: 0.08em;
153
+ }
154
+ .b-none { background: rgba(0,230,118,0.12); color: #00e676; border: 1px solid #00e676; }
155
+ .b-missing { background: rgba(255,183,0,0.12); color: #ffb700; border: 1px solid #ffb700; }
156
+ .b-solder { background: rgba(255,120,0,0.12); color: #ff7800; border: 1px solid #ff7800; }
157
+ .b-short { background: rgba(255,50,50,0.12); color: #ff3232; border: 1px solid #ff3232; }
158
+ .anomaly-badge {
159
+ display: inline-block;
160
+ padding: 2px 12px;
161
+ border-radius: 4px;
162
+ font-size: 0.72rem;
163
+ font-weight: 700;
164
+ background: rgba(255,0,200,0.12);
165
+ color: #ff00c8;
166
+ border: 1px solid #ff00c8;
167
+ letter-spacing: 0.1em;
168
+ animation: anomalyPulse 1.2s ease-in-out infinite;
169
+ }
170
+ @keyframes anomalyPulse {
171
+ 0% { box-shadow: 0 0 4px rgba(255,0,200,0.2); }
172
+ 50% { box-shadow: 0 0 16px rgba(255,0,200,0.6); }
173
+ 100% { box-shadow: 0 0 4px rgba(255,0,200,0.2); }
174
+ }
175
+ .slot-grid { display: flex; flex-wrap: wrap; gap: 8px; margin-top: 4px; }
176
+ .slot-item {
177
+ display: flex; align-items: center; gap: 8px;
178
+ background: #0a1420; border-radius: 6px;
179
+ padding: 7px 13px;
180
+ font-family: 'Share Tech Mono', monospace;
181
+ font-size: 0.75rem;
182
+ border: 1px solid rgba(255,255,255,0.05);
183
+ min-width: 128px;
184
+ }
185
+ .dot { width:9px; height:9px; border-radius:50%; flex-shrink:0; }
186
+ .dot-free { background:#00e676; box-shadow:0 0 7px #00e676; }
187
+ .dot-busy { background:#ff3232; box-shadow:0 0 7px #ff3232; }
188
+ .dot-lock { background:#3a3a3a; }
189
+ .free { color:#00e676; }
190
+ .busy { color:#ff5a5a; }
191
+ .lock { color:#3a3a3a; }
192
+ .rpill {
193
+ display: inline-block;
194
+ padding: 5px 20px;
195
+ border-radius: 20px;
196
+ font-family: 'Rajdhani', sans-serif;
197
+ font-size: 1.3rem;
198
+ font-weight: 700;
199
+ letter-spacing: 0.08em;
200
+ }
201
+ .rpos { background:rgba(0,230,118,0.11); color:#00e676; border:1px solid rgba(0,230,118,0.35); }
202
+ .rneg { background:rgba(255,50,50,0.11); color:#ff5a5a; border:1px solid rgba(255,50,50,0.35); }
203
+ .rzero { background:rgba(140,140,140,0.09);color:#888; border:1px solid rgba(140,140,140,0.25); }
204
+ .score-big {
205
+ font-family: 'Rajdhani', sans-serif;
206
+ font-size: 2.4rem;
207
+ font-weight: 800;
208
+ letter-spacing: 0.05em;
209
+ text-shadow: 0 0 14px currentColor;
210
+ }
211
+ hr { border:none; border-top:1px solid rgba(0,229,255,0.08) !important; margin:1.2rem 0 !important; }
212
+ .idle {
213
+ text-align:center; padding:44px 20px;
214
+ border:1px dashed rgba(0,229,255,0.15); border-radius:12px;
215
+ color:#1e4a5a; font-family:'Share Tech Mono',monospace;
216
+ font-size:0.8rem; letter-spacing:0.12em; margin-top:36px;
217
+ }
218
+ .reward-row {
219
+ display: flex; align-items: center; gap: 10px;
220
+ font-family: 'Share Tech Mono', monospace;
221
+ font-size: 0.74rem;
222
+ margin-bottom: 6px;
223
+ }
224
+ .reward-label { color: #2e6a80; width: 160px; flex-shrink: 0; }
225
+ .reward-bar-wrap { flex: 1; background: #0a1420; border-radius: 4px; height: 8px; }
226
+ .reward-bar { height: 8px; border-radius: 4px; }
227
+ .reward-val { color: #c9f0ff; width: 48px; text-align: right; }
228
+ [data-testid="stProgressBar"] > div > div {
229
+ background: linear-gradient(90deg, #0d5e70, #00e5ff) !important;
230
+ border-radius: 4px;
231
+ }
232
+ [data-testid="stProgressBar"] {
233
+ background: #0a1420 !important;
234
+ border: 1px solid rgba(0,229,255,0.12);
235
+ border-radius: 4px;
236
+ }
237
+ .stCaption {
238
+ font-family: 'Share Tech Mono', monospace !important;
239
+ font-size: 0.68rem !important;
240
+ color: #2e6a80 !important;
241
+ letter-spacing: 0.1em;
242
+ }
243
+ @keyframes pulseGlow {
244
+ 0% { box-shadow: 0 0 5px rgba(0,229,255,0.15); }
245
+ 50% { box-shadow: 0 0 22px rgba(0,229,255,0.45); }
246
+ 100% { box-shadow: 0 0 5px rgba(0,229,255,0.15); }
247
+ }
248
+ .stSuccess, .stWarning, .stError, .stInfo {
249
+ animation: pulseGlow 1.5s ease-in-out infinite;
250
+ }
251
+ </style>
252
+ """, unsafe_allow_html=True)
253
+
254
+ # ---------------------------
255
+ # SESSION STATE
256
+ # ---------------------------
257
+ def _init_state():
258
+ if "env" not in st.session_state:
259
+ st.session_state.env = None
260
+ if "score" not in st.session_state:
261
+ st.session_state.score = 0.0
262
+ if "history" not in st.session_state:
263
+ st.session_state.history = [] # cumulative reward over time
264
+ if "running" not in st.session_state:
265
+ st.session_state.running = False
266
+ if "log" not in st.session_state:
267
+ st.session_state.log = [] # list of (pcb_obs, action, rc)
268
+ if "task_id" not in st.session_state:
269
+ st.session_state.task_id = "task_easy"
270
+ if "last_result" not in st.session_state:
271
+ st.session_state.last_result = None
272
+ if "episode_done" not in st.session_state:
273
+ st.session_state.episode_done = False
274
+
275
+ _init_state()
276
+
277
+ # ---------------------------
278
+ # HELPERS
279
+ # ---------------------------
280
+ def defect_badge(d):
281
+ m = {
282
+ "none": ("b-none", "✓ NONE"),
283
+ "missing_component": ("b-missing", "⚠ MISSING COMPONENT"),
284
+ "solder_bridge": ("b-solder", "⚡ SOLDER BRIDGE"),
285
+ "short_circuit": ("b-short", "✗ SHORT CIRCUIT"),
286
+ }
287
+ cls, label = m.get(d, ("b-none", d.upper()))
288
+ return f'<span class="defect-badge {cls}">{label}</span>'
289
+
290
+
291
+ def reward_bar_html(label, score, color="#00e5ff"):
292
+ pct = int(score * 100)
293
+ return (
294
+ f'<div class="reward-row">'
295
+ f' <span class="reward-label">{label}</span>'
296
+ f' <div class="reward-bar-wrap">'
297
+ f' <div class="reward-bar" style="width:{pct}%;background:{color};"></div>'
298
+ f' </div>'
299
+ f' <span class="reward-val">{score:.2f}</span>'
300
+ f'</div>'
301
+ )
302
+
303
+
304
+ def get_env() -> SpectraQualEnv:
305
+ if st.session_state.env is None:
306
+ st.session_state.env = SpectraQualEnv(task_id=st.session_state.task_id)
307
+ return st.session_state.env
308
+
309
+
310
+ # ---------------------------
311
+ # HEADER
312
+ # ---------------------------
313
+ st.title("⚔️ SPECTRAQUAL — SMART PCB DECISION SYSTEM")
314
+ st.markdown(
315
+ '<p style="font-family:\'Share Tech Mono\',monospace;font-size:0.72rem;'
316
+ 'color:#1e4a5a;letter-spacing:0.16em;margin-top:-10px;margin-bottom:4px;">'
317
+ 'REAL-TIME QUALITY INTELLIGENCE ENGINE · v3.0 · OpenEnv Compliant</p>',
318
+ unsafe_allow_html=True,
319
+ )
320
+
321
+ # ---------------------------
322
+ # SIDEBAR TASK SELECTOR
323
+ # ---------------------------
324
+ with st.sidebar:
325
+ st.markdown("### 🎯 Task Selection")
326
+ task_choice = st.selectbox(
327
+ "Select Task",
328
+ options=list(TASKS.keys()),
329
+ format_func=lambda t: f"{t} ({TASKS[t]['difficulty'].upper()})",
330
+ index=list(TASKS.keys()).index(st.session_state.task_id),
331
+ )
332
+ if task_choice != st.session_state.task_id:
333
+ st.session_state.task_id = task_choice
334
+ st.session_state.env = None
335
+ st.session_state.score = 0.0
336
+ st.session_state.history = []
337
+ st.session_state.log = []
338
+ st.session_state.last_result = None
339
+ st.session_state.episode_done = False
340
+
341
+ cfg = TASKS[st.session_state.task_id]
342
+ st.markdown(f"""
343
+ **Boards:** {cfg['n_boards']}
344
+ **Slots:** {cfg['n_slots']}
345
+ **Seed:** {cfg['seed']}
346
+ **Anomaly Rate:** {cfg['anomaly_rate']:.0%}
347
+ **Difficulty:** {cfg['difficulty'].upper()}
348
+ """)
349
+ st.markdown("---")
350
+ speed = st.slider("⚡ Speed (s/step)", 0.2, 2.0, 0.8, step=0.1)
351
+
352
+ # ---------------------------
353
+ # SPEED (fallback if sidebar collapsed)
354
+ # ---------------------------
355
+ if "speed" not in dir():
356
+ speed = 0.8
357
+
358
+ st.markdown("<hr>", unsafe_allow_html=True)
359
+
360
+ # ---------------------------
361
+ # METRICS BAR
362
+ # ---------------------------
363
+ env_obj = get_env()
364
+ state = env_obj.state()
365
+
366
+ m1, m2, m3, m4, m5 = st.columns(5)
367
+ m1.metric("💰 Cumul. Reward", f"{state['cumulative_reward']:.3f}")
368
+ m2.metric("🎯 Accuracy", f"{state['rolling_accuracy']:.1%}")
369
+ m3.metric("⚙️ Active Slots", sum(1 for s in state['slots'] if 0 < s < 9999))
370
+ m4.metric("🧠 Decisions", state['total_count'])
371
+ m5.metric("⚠️ Bottlenecks", state['bottleneck_count'])
372
+
373
+ last_r = round(st.session_state.log[-1][2].normalized, 3) if st.session_state.log else "N/A"
374
+ status_color = "#00e5ff" if st.session_state.log else "#1e4a5a"
375
+ st.markdown(f"""
376
+ <div style="font-family:'Share Tech Mono',monospace;font-size:0.75rem;
377
+ color:{status_color};padding:6px 14px;border:1px solid rgba(0,229,255,0.2);
378
+ border-radius:6px;display:inline-block;margin-top:10px;margin-bottom:4px;
379
+ background:rgba(0,229,255,0.03);letter-spacing:0.1em;">
380
+ 🟢 TASK: {st.session_state.task_id.upper()} &nbsp;·&nbsp; LAST REWARD: {last_r} &nbsp;·&nbsp; STEPS: {state['step']}
381
+ </div>
382
+ """, unsafe_allow_html=True)
383
+
384
+ st.markdown("<hr>", unsafe_allow_html=True)
385
+
386
+ # ---------------------------
387
+ # CONTROL BUTTONS
388
+ # ---------------------------
389
+ c1, c2, c3, c4, c5 = st.columns(5)
390
+ with c1:
391
+ if st.button("▶ RUN STEP"):
392
+ st.session_state.running = False
393
+ st.session_state.run_once = True
394
+ with c2:
395
+ if st.button("⚡ AUTO RUN"):
396
+ st.session_state.running = True
397
+ with c3:
398
+ if st.button("⛔ STOP"):
399
+ st.session_state.running = False
400
+ with c4:
401
+ if st.button("🔄 RESET"):
402
+ env_obj.reset()
403
+ st.session_state.score = 0.0
404
+ st.session_state.history = []
405
+ st.session_state.log = []
406
+ st.session_state.last_result = None
407
+ st.session_state.episode_done = False
408
+ with c5:
409
+ if st.button("🆕 NEW TASK"):
410
+ st.session_state.env = None
411
+ st.session_state.score = 0.0
412
+ st.session_state.history = []
413
+ st.session_state.log = []
414
+ st.session_state.last_result = None
415
+ st.session_state.episode_done = False
416
+
417
+ # ---------------------------
418
+ # CORE STEP
419
+ # ---------------------------
420
+ def run_step():
421
+ env = get_env()
422
+
423
+ # Initialize if needed
424
+ if env._done or env._current_pcb is None:
425
+ result = env.reset()
426
+ if result.done:
427
+ st.session_state.episode_done = True
428
+ return None
429
+
430
+ # Get current obs to determine action
431
+ obs = env._build_observation(*__import__("reward").detect_anomaly(env._current_pcb))
432
+
433
+ # Use rule-based decision (greedy heuristic)
434
+ from env import decide_action
435
+ pcb_dict = {
436
+ "defect_type": obs.defect_type,
437
+ "component_cost": obs.component_cost,
438
+ "criticality": obs.criticality,
439
+ }
440
+ action_str = decide_action(pcb_dict)
441
+
442
+ result = env.step(PCBAction(action=action_str))
443
+ rc = result.reward_components
444
+
445
+ st.session_state.score = env.state()["cumulative_reward"]
446
+ st.session_state.history.append(st.session_state.score)
447
+ st.session_state.log.append((result.observation, action_str, rc))
448
+ st.session_state.last_result = result
449
+
450
+ if result.done:
451
+ st.session_state.episode_done = True
452
+
453
+ return result
454
+
455
+
456
+ # ---------------------------
457
+ # DISPLAY
458
+ # ---------------------------
459
+ def display(result):
460
+ from collections import Counter
461
+
462
+ obs = result.observation
463
+ rc = result.reward_components
464
+ col1, col2 = st.columns(2, gap="large")
465
+
466
+ # ── LEFT ──
467
+ with col1:
468
+ st.subheader("PCB Info")
469
+ anomaly_html = ""
470
+ if obs.is_anomaly:
471
+ anomaly_html = f'<span class="anomaly-badge">⚠️ ANOMALY {obs.anomaly_score:.2f}</span>'
472
+
473
+ st.markdown(f"""
474
+ <div class="pcb-card">
475
+ <div><span class="lbl">Board ID &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span>
476
+ <span class="val">{obs.board_id}</span></div>
477
+ <div><span class="lbl">Defect Type &nbsp;&nbsp;</span>
478
+ {defect_badge(obs.defect_type)}</div>
479
+ <div><span class="lbl">Component Cost </span>
480
+ <span class="val">₹{obs.component_cost:.2f}</span></div>
481
+ <div><span class="lbl">Criticality &nbsp;&nbsp;&nbsp;</span>
482
+ <span class="val">{obs.criticality:.2f}</span></div>
483
+ <div><span class="lbl">Anomaly &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span>
484
+ {anomaly_html if anomaly_html else '<span class="val" style="color:#2e6a80;">Normal</span>'}</div>
485
+ </div>
486
+ """, unsafe_allow_html=True)
487
+
488
+ st.subheader("Decision")
489
+ action = st.session_state.log[-1][1] if st.session_state.log else "N/A"
490
+ if action == "PASS":
491
+ st.success(f"✅ {action}")
492
+ elif "ROUTE" in action:
493
+ st.warning(f"🛠️ {action}")
494
+ elif action == "WAIT":
495
+ st.warning("⏳ WAITING FOR SLOT AVAILABILITY")
496
+ else:
497
+ st.error(f"❌ {action}")
498
+
499
+ if rc:
500
+ st.subheader("🧠 Why this decision?")
501
+ explanation_parts = rc.explanation.split(" | ")
502
+ for part in explanation_parts[:3]:
503
+ st.info(part)
504
+
505
+ st.subheader("Step Reward")
506
+ r = result.reward
507
+ if r >= 0.6:
508
+ st.markdown(f'<span class="rpill rpos">▲ {r:.4f}</span>', unsafe_allow_html=True)
509
+ elif r >= 0.35:
510
+ st.markdown(f'<span class="rpill rzero">● {r:.4f}</span>', unsafe_allow_html=True)
511
+ else:
512
+ st.markdown(f'<span class="rpill rneg">▼ {r:.4f}</span>', unsafe_allow_html=True)
513
+
514
+ if rc:
515
+ st.subheader("📊 Reward Component Breakdown")
516
+ components = [
517
+ ("Defect Handling", rc.defect_reward, "#00e5ff"),
518
+ ("Cost Efficiency", rc.cost_efficiency, "#00e676"),
519
+ ("Queue Mgmt", rc.queue_penalty, "#ffb700"),
520
+ ("Risk Factor", rc.criticality_factor, "#ff7800"),
521
+ ("Anomaly Bonus", rc.anomaly_bonus, "#ff00c8"),
522
+ ]
523
+ bars_html = ""
524
+ for label, val, color in components:
525
+ bars_html += reward_bar_html(label, val, color)
526
+ st.markdown(bars_html, unsafe_allow_html=True)
527
+
528
+ st.subheader("Rolling Metrics")
529
+ sub1, sub2 = st.columns(2)
530
+ with sub1:
531
+ st.metric("🎯 Accuracy", f"{obs.rolling_accuracy:.1%}")
532
+ with sub2:
533
+ st.metric("⚡ Throughput", f"{obs.throughput:.2f}")
534
+
535
+ # ── RIGHT ──
536
+ with col2:
537
+ st.subheader("Factory Slots")
538
+ slot_html = '<div class="slot-grid">'
539
+ for i, slot in enumerate(obs.slots_state):
540
+ if slot == -1:
541
+ slot_html += (f'<div class="slot-item"><div class="dot dot-lock"></div>'
542
+ f'<span class="lock">SLOT {i:02d} · LOCKED</span></div>')
543
+ elif slot > 0:
544
+ slot_html += (f'<div class="slot-item"><div class="dot dot-busy"></div>'
545
+ f'<span class="busy">SLOT {i:02d} · {slot}t</span></div>')
546
+ else:
547
+ slot_html += (f'<div class="slot-item"><div class="dot dot-free"></div>'
548
+ f'<span class="free">SLOT {i:02d} · FREE</span></div>')
549
+ slot_html += '</div>'
550
+ st.markdown(slot_html, unsafe_allow_html=True)
551
+
552
+ st.subheader("Cumulative Reward")
553
+ score_color = "#00e676" if st.session_state.score >= 0.5 else "#ff5a5a"
554
+ st.markdown(
555
+ f'<div class="score-big" style="color:{score_color}">'
556
+ f'{st.session_state.score:.4f}</div>',
557
+ unsafe_allow_html=True,
558
+ )
559
+
560
+ st.subheader("📈 Reward Trend")
561
+ fig, ax = plt.subplots(figsize=(5.5, 3))
562
+ fig.patch.set_facecolor("#080c12")
563
+ ax.set_facecolor("#0a1420")
564
+ history = st.session_state.history
565
+ if history:
566
+ ax.plot(history, color="#00e5ff", linewidth=1.8,
567
+ marker='o', markersize=3.5,
568
+ markerfacecolor="#00e5ff", markeredgewidth=0)
569
+ ax.fill_between(range(len(history)), history, alpha=0.10, color="#00e5ff")
570
+ ax.axhline(y=0.6, color="#00e676", linewidth=0.8, linestyle="--", alpha=0.5, label="Success threshold")
571
+ ax.set_title("Cumulative Reward", color="#2e6a80", fontsize=9, pad=8)
572
+ ax.set_xlabel("Steps", color="#2e6a80", fontsize=8)
573
+ ax.set_ylabel("Score", color="#2e6a80", fontsize=8)
574
+ ax.set_ylim(0, max(max(history, default=1.0) * 1.1, 1.0))
575
+ ax.tick_params(colors="#2e6a80", labelsize=7)
576
+ ax.grid(color="#0d2535", linewidth=0.7, linestyle="--")
577
+ for spine in ax.spines.values():
578
+ spine.set_edgecolor("#0d2535")
579
+ fig.tight_layout(pad=1.2)
580
+ st.pyplot(fig)
581
+ plt.close(fig)
582
+
583
+ # Stacked Reward Components Over Time
584
+ if len(st.session_state.log) >= 2:
585
+ st.subheader("📊 Component Breakdown Over Time")
586
+ steps_data = st.session_state.log[-20:] # last 20 steps
587
+ comp_labels = ["Defect", "Cost", "Queue", "Risk", "Anomaly"]
588
+ comp_colors = ["#00e5ff", "#00e676", "#ffb700", "#ff7800", "#ff00c8"]
589
+ comp_data = {l: [] for l in comp_labels}
590
+
591
+ for _, _, rc_entry in steps_data:
592
+ if rc_entry:
593
+ comp_data["Defect"].append(rc_entry.defect_reward)
594
+ comp_data["Cost"].append(rc_entry.cost_efficiency)
595
+ comp_data["Queue"].append(rc_entry.queue_penalty)
596
+ comp_data["Risk"].append(rc_entry.criticality_factor)
597
+ comp_data["Anomaly"].append(rc_entry.anomaly_bonus)
598
+
599
+ if any(comp_data.values()):
600
+ fig2, ax2 = plt.subplots(figsize=(5.5, 2.8))
601
+ fig2.patch.set_facecolor("#080c12")
602
+ ax2.set_facecolor("#0a1420")
603
+ x = list(range(len(next(iter(comp_data.values())))))
604
+ bottom = [0.0] * len(x)
605
+ for label, color in zip(comp_labels, comp_colors):
606
+ vals = comp_data[label]
607
+ if vals and len(vals) == len(x):
608
+ # Normalize each component's contribution by weight
609
+ ax2.fill_between(x, bottom,
610
+ [b + v * 0.2 for b, v in zip(bottom, vals)],
611
+ alpha=0.6, color=color, label=label)
612
+ bottom = [b + v * 0.2 for b, v in zip(bottom, vals)]
613
+ ax2.set_title("Reward Components (last 20 steps)", color="#2e6a80", fontsize=8, pad=6)
614
+ ax2.set_xlabel("Steps", color="#2e6a80", fontsize=7)
615
+ ax2.tick_params(colors="#2e6a80", labelsize=6)
616
+ ax2.grid(color="#0d2535", linewidth=0.5, linestyle="--")
617
+ for spine in ax2.spines.values():
618
+ spine.set_edgecolor("#0d2535")
619
+ ax2.legend(loc="upper right", fontsize=6,
620
+ facecolor="#080c12", edgecolor="#2e6a80", labelcolor="#c9d4e0")
621
+ fig2.tight_layout(pad=1.0)
622
+ st.pyplot(fig2)
623
+ plt.close(fig2)
624
+
625
+ # Decision Distribution
626
+ if st.session_state.log:
627
+ st.subheader("📊 Decision Distribution")
628
+ decisions = [entry[1] for entry in st.session_state.log]
629
+ from collections import Counter
630
+ counts = dict(Counter(decisions))
631
+ st.bar_chart(counts)
632
+
633
+ # Episode Done banner
634
+ if st.session_state.episode_done:
635
+ final = st.session_state.score
636
+ if final >= 0.6:
637
+ st.success(f"🏆 EPISODE COMPLETE — Score: {final:.4f} — SUCCESS!")
638
+ else:
639
+ st.warning(f"⚠️ EPISODE COMPLETE — Score: {final:.4f} — Below success threshold (0.60)")
640
+
641
+
642
+ # ---------------------------
643
+ # EXECUTION
644
+ # ---------------------------
645
+ if "run_once" in st.session_state and st.session_state.run_once:
646
+ result = run_step()
647
+ if result:
648
+ display(result)
649
+ st.session_state.run_once = False
650
+
651
+ elif st.session_state.running:
652
+ placeholder = st.empty()
653
+ for _ in range(1000):
654
+ if not st.session_state.running:
655
+ break
656
+ if st.session_state.episode_done:
657
+ st.session_state.running = False
658
+ break
659
+ result = run_step()
660
+ if result:
661
+ with placeholder.container():
662
+ display(result)
663
+ time.sleep(speed)
664
+
665
+ elif st.session_state.last_result:
666
+ display(st.session_state.last_result)
667
+
668
+ else:
669
+ st.markdown("""
670
+ <div class="idle">
671
+ [ SYSTEM IDLE ]<br><br>
672
+ SELECT A TASK IN THE SIDEBAR &nbsp; · &nbsp; PRESS &nbsp; ▶ RUN STEP &nbsp; OR &nbsp; ⚡ AUTO RUN &nbsp; TO BEGIN
673
+ </div>
674
+ """, unsafe_allow_html=True)
src/config.py ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ config.py — SpectraQual Centralized Configuration
3
+ All constants, reward weights, task definitions, and environment settings live here.
4
+ """
5
+
6
+ # ---------------------------
7
+ # DEFECT TYPES
8
+ # ---------------------------
9
+ DEFECT_TYPES = ["none", "missing_component", "solder_bridge", "short_circuit"]
10
+
11
+ # ---------------------------
12
+ # ACTION SPACE
13
+ # ---------------------------
14
+ ACTIONS = [
15
+ "PASS",
16
+ "SCRAP",
17
+ "ROUTE_COMPONENT_REPLACEMENT",
18
+ "ROUTE_SOLDERING",
19
+ "ROUTE_DIAGNOSTICS",
20
+ "WAIT",
21
+ ]
22
+
23
+ # Valid actions per defect type
24
+ VALID_ACTIONS = {
25
+ "none": ["PASS"],
26
+ "missing_component": ["ROUTE_COMPONENT_REPLACEMENT", "SCRAP"],
27
+ "solder_bridge": ["ROUTE_SOLDERING", "WAIT", "SCRAP"],
28
+ "short_circuit": ["SCRAP", "ROUTE_DIAGNOSTICS"],
29
+ }
30
+
31
+ # ---------------------------
32
+ # FACTORY SETTINGS
33
+ # ---------------------------
34
+ N_SOLDERING_SLOTS = 3 # Number of parallel soldering slots
35
+ SOLDERING_JOB_DURATION = 2 # Time units a soldering job occupies a slot
36
+
37
+ # ---------------------------
38
+ # PCB GENERATION BOUNDS
39
+ # ---------------------------
40
+ COMPONENT_COST_MIN = 10.0
41
+ COMPONENT_COST_MAX = 200.0
42
+ CRITICALITY_MIN = 0.1
43
+ CRITICALITY_MAX = 1.0
44
+
45
+ # Anomaly: board_id prefix for rare-defect boards
46
+ ANOMALY_COST_THRESHOLD = 180.0 # cost > this → anomaly candidate
47
+ ANOMALY_CRITICALITY_THRESHOLD = 0.92 # criticality > this → anomaly candidate
48
+
49
+ # ---------------------------
50
+ # REWARD WEIGHTS (multi-component)
51
+ # ---------------------------
52
+ REWARD_WEIGHT_DEFECT = 0.35
53
+ REWARD_WEIGHT_COST = 0.25
54
+ REWARD_WEIGHT_QUEUE = 0.20
55
+ REWARD_WEIGHT_CRITICALITY = 0.10
56
+ REWARD_WEIGHT_ANOMALY = 0.10
57
+
58
+ # Raw reward scaling reference (used for normalization)
59
+ RAW_REWARD_MIN = -60.0
60
+ RAW_REWARD_MAX = 160.0
61
+
62
+ # ---------------------------
63
+ # TASK DEFINITIONS
64
+ # ---------------------------
65
+ TASKS = {
66
+ "task_easy": {
67
+ "id": "task_easy",
68
+ "description": "Triage 10 boards with no slot pressure. Focus: correct defect classification.",
69
+ "difficulty": "easy",
70
+ "n_boards": 10,
71
+ "seed": 42,
72
+ "n_slots": 3, # all slots always available
73
+ "anomaly_rate": 0.0,
74
+ },
75
+ "task_medium": {
76
+ "id": "task_medium",
77
+ "description": "Triage 15 boards with one soldering slot. Manage queue pressure.",
78
+ "difficulty": "medium",
79
+ "n_boards": 15,
80
+ "seed": 99,
81
+ "n_slots": 1, # only 1 slot → queue pressure
82
+ "anomaly_rate": 0.1,
83
+ },
84
+ "task_hard": {
85
+ "id": "task_hard",
86
+ "description": "Triage 20 boards with mixed anomalies and tight slot constraints.",
87
+ "difficulty": "hard",
88
+ "n_boards": 20,
89
+ "seed": 777,
90
+ "n_slots": 1,
91
+ "anomaly_rate": 0.25,
92
+ },
93
+ }
94
+
95
+ # Grader thresholds
96
+ MEDIUM_ECONOMIC_TARGET = 0.50 # 50% of max possible economic reward
97
+ HARD_ANOMALY_RATE_TARGET = 0.50 # must flag ≥50% of actual anomalies
98
+
99
+ # ---------------------------
100
+ # INFERENCE SCRIPT SETTINGS
101
+ # ---------------------------
102
+ MAX_STEPS_PER_TASK = 25 # safety cap (must fit in 20-min runtime)
103
+ SUCCESS_SCORE_THRESHOLD = 0.60 # ≥0.60 normalized score = success
104
+ TEMPERATURE = 0.2
105
+ MAX_TOKENS = 64 # actions are short, no need for long outputs
106
+
107
+ # ---------------------------
108
+ # LOGGING COLOR REFERENCE (for app.py)
109
+ # ---------------------------
110
+ COLOR_PRIMARY = "#00e5ff"
111
+ COLOR_SUCCESS = "#00e676"
112
+ COLOR_WARNING = "#ffb700"
113
+ COLOR_DANGER = "#ff3232"
114
+ COLOR_BG = "#080c12"
115
+ COLOR_CARD = "#0d1b2a"
116
+ COLOR_MUTED = "#2e6a80"
src/env.py ADDED
@@ -0,0 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ env.py — SpectraQual OpenEnv-Compliant Environment
3
+ Implements the full OpenEnv interface: reset() / step() / state()
4
+ with seeding, anomaly detection, episode management, and rolling metrics.
5
+ """
6
+
7
+ from __future__ import annotations
8
+ import random
9
+ import sys
10
+ import os
11
+ from typing import Dict, Any, Optional, List
12
+
13
+ # Allow running from src/ directory directly
14
+ sys.path.insert(0, os.path.dirname(__file__))
15
+
16
+ from config import (
17
+ DEFECT_TYPES,
18
+ VALID_ACTIONS,
19
+ N_SOLDERING_SLOTS,
20
+ SOLDERING_JOB_DURATION,
21
+ COMPONENT_COST_MIN,
22
+ COMPONENT_COST_MAX,
23
+ CRITICALITY_MIN,
24
+ CRITICALITY_MAX,
25
+ TASKS,
26
+ )
27
+ from models import PCBObservation, PCBAction, StepResult, RewardComponents
28
+ from reward import calculate_reward, detect_anomaly
29
+
30
+
31
+ # ---------------------------
32
+ # SPECTRAQUAL ENVIRONMENT
33
+ # ---------------------------
34
+ class SpectraQualEnv:
35
+ """
36
+ PCB Smart Quality-Control Triage Environment.
37
+
38
+ An AI agent processes a stream of printed circuit boards, each with a
39
+ randomly (but reproducibly seeded) assigned defect. The agent must choose
40
+ the optimal triage action given economic constraints and factory slot availability.
41
+
42
+ Implements the OpenEnv interface:
43
+ reset() → StepResult (initial observation)
44
+ step() → StepResult
45
+ state() → dict (full internal state)
46
+ """
47
+
48
+ def __init__(self, task_id: str = "task_easy", seed: Optional[int] = None):
49
+ if task_id not in TASKS:
50
+ raise ValueError(f"Unknown task_id '{task_id}'. Valid: {list(TASKS.keys())}")
51
+
52
+ self.task_cfg = TASKS[task_id]
53
+ self.task_id = task_id
54
+ self.seed = seed if seed is not None else self.task_cfg["seed"]
55
+ self._rng = random.Random(self.seed)
56
+
57
+ # Runtime state (initialized on reset)
58
+ self._slots: List[int] = []
59
+ self._step_num: int = 0
60
+ self._done: bool = True
61
+ self._current_pcb: Optional[Dict] = None
62
+ self._correct_count: int = 0
63
+ self._total_count: int = 0
64
+ self._bottleneck_cnt: int = 0
65
+ self._anomaly_total: int = 0
66
+ self._anomaly_flagged:int = 0
67
+ self._cumulative_reward: float = 0.0
68
+ self._reward_history: List[float] = []
69
+ self._all_rewards: List[float] = []
70
+
71
+ # ------------------------------------------------
72
+ # INTERNAL HELPERS
73
+ # ------------------------------------------------
74
+ def _reset_slots(self) -> None:
75
+ n = self.task_cfg["n_slots"]
76
+ # Fill remaining slots with 0 (free) up to N_SOLDERING_SLOTS
77
+ self._slots = [0] * N_SOLDERING_SLOTS
78
+ # Mark slots beyond the task limit as permanently busy (simulates fewer slots)
79
+ for i in range(n, N_SOLDERING_SLOTS):
80
+ self._slots[i] = 9999 # permanently locked
81
+
82
+ def _get_slot_view(self) -> List[int]:
83
+ """Public view: replace 9999 sentinel with -1 for clarity."""
84
+ return [s if s != 9999 else -1 for s in self._slots]
85
+
86
+ def _count_free_slots(self) -> int:
87
+ return sum(1 for s in self._slots if s == 0)
88
+
89
+ def _tick_slots(self) -> None:
90
+ """Advance factory time: reduce non-locked slot timers by 1."""
91
+ for i in range(len(self._slots)):
92
+ if 0 < self._slots[i] < 9999:
93
+ self._slots[i] -= 1
94
+
95
+ def _assign_slot(self) -> bool:
96
+ """Try to assign a soldering job. Returns True if successful."""
97
+ for i in range(len(self._slots)):
98
+ if self._slots[i] == 0:
99
+ self._slots[i] = SOLDERING_JOB_DURATION
100
+ return True
101
+ return False
102
+
103
+ def _generate_pcb(self) -> Dict[str, Any]:
104
+ """Generate a random PCB using internal seeded RNG."""
105
+ # Inject anomaly based on task config
106
+ anomaly_roll = self._rng.random()
107
+ anomaly_rate = self.task_cfg.get("anomaly_rate", 0.0)
108
+
109
+ if anomaly_rate > 0 and anomaly_roll < anomaly_rate:
110
+ # Force extreme values
111
+ cost = round(self._rng.uniform(185.0, 200.0), 2)
112
+ criticality = round(self._rng.uniform(0.93, 1.0), 2)
113
+ defect = self._rng.choice(["missing_component", "short_circuit"])
114
+ else:
115
+ defect = self._rng.choice(DEFECT_TYPES)
116
+ cost = round(self._rng.uniform(COMPONENT_COST_MIN, COMPONENT_COST_MAX), 2)
117
+ criticality = round(self._rng.uniform(CRITICALITY_MIN, CRITICALITY_MAX), 2)
118
+
119
+ board_id = f"SQ-{self._rng.randint(1000, 9999)}"
120
+
121
+ return {
122
+ "board_id": board_id,
123
+ "defect_type": defect,
124
+ "component_cost": cost,
125
+ "criticality": criticality,
126
+ }
127
+
128
+ def _is_correct(self, defect: str, action: str) -> bool:
129
+ """Check if action is the single best action for this defect."""
130
+ best = {
131
+ "none": "PASS",
132
+ "missing_component": "ROUTE_COMPONENT_REPLACEMENT",
133
+ "solder_bridge": "ROUTE_SOLDERING",
134
+ "short_circuit": "SCRAP",
135
+ }
136
+ return best.get(defect) == action
137
+
138
+ def _build_observation(self, is_anomaly: bool, anomaly_score: float) -> PCBObservation:
139
+ pcb = self._current_pcb
140
+ defect = pcb["defect_type"]
141
+ free_slots = self._count_free_slots()
142
+ slot_view = self._get_slot_view()
143
+ total = self._total_count or 1
144
+
145
+ return PCBObservation(
146
+ board_id=pcb["board_id"],
147
+ defect_type=defect,
148
+ component_cost=pcb["component_cost"],
149
+ criticality=pcb["criticality"],
150
+ slots_free=free_slots,
151
+ slots_state=slot_view,
152
+ is_anomaly=is_anomaly,
153
+ anomaly_score=round(anomaly_score, 4),
154
+ step=self._step_num,
155
+ task_id=self.task_id,
156
+ valid_actions=VALID_ACTIONS.get(defect, ["SCRAP"]),
157
+ rolling_accuracy=round(self._correct_count / total, 4),
158
+ throughput=round(self._total_count / max(self._step_num, 1), 4),
159
+ cumulative_reward=round(self._cumulative_reward, 4),
160
+ )
161
+
162
+ # ------------------------------------------------
163
+ # PUBLIC OPENENV INTERFACE
164
+ # ------------------------------------------------
165
+ def reset(self) -> StepResult:
166
+ """
167
+ Reset the environment to a clean initial state.
168
+ Returns the first observation without a reward.
169
+ """
170
+ self._rng = random.Random(self.seed)
171
+ self._step_num = 0
172
+ self._done = False
173
+ self._correct_count = 0
174
+ self._total_count = 0
175
+ self._bottleneck_cnt = 0
176
+ self._anomaly_total = 0
177
+ self._anomaly_flagged = 0
178
+ self._cumulative_reward = 0.0
179
+ self._reward_history = []
180
+ self._all_rewards = []
181
+
182
+ self._reset_slots()
183
+ self._current_pcb = self._generate_pcb()
184
+
185
+ is_anomaly, anomaly_score = detect_anomaly(self._current_pcb)
186
+ if is_anomaly:
187
+ self._anomaly_total += 1
188
+
189
+ obs = self._build_observation(is_anomaly, anomaly_score)
190
+
191
+ return StepResult(
192
+ observation=obs,
193
+ reward=0.0,
194
+ reward_components=None,
195
+ done=False,
196
+ info={"message": "Environment reset. Episode started.", "seed": self.seed},
197
+ )
198
+
199
+ def step(self, action: PCBAction) -> StepResult:
200
+ """
201
+ Apply an action to the current board.
202
+ Advances factory state, computes reward, generates next PCB.
203
+ """
204
+ if self._done:
205
+ raise RuntimeError("Episode is done. Call reset() before stepping.")
206
+
207
+ self._step_num += 1
208
+ self._total_count += 1
209
+ action_str = action.action
210
+ pcb = self._current_pcb
211
+ defect = pcb["defect_type"]
212
+
213
+ # Check if action is valid (penalize but don't crash)
214
+ valid = VALID_ACTIONS.get(defect, ["SCRAP"])
215
+ if action_str not in valid:
216
+ # Remap invalid action to SCRAP (safe fallback)
217
+ action_str = "SCRAP"
218
+
219
+ # Factory tick
220
+ self._tick_slots()
221
+
222
+ # Handle soldering slot assignment
223
+ if action_str == "ROUTE_SOLDERING":
224
+ assigned = self._assign_slot()
225
+ if not assigned:
226
+ self._bottleneck_cnt += 1
227
+
228
+ # Anomaly detection
229
+ is_anomaly, anomaly_score = detect_anomaly(pcb)
230
+ if is_anomaly:
231
+ self._anomaly_total += 1
232
+ # Track if agent "handled" anomaly correctly (chose optimal action)
233
+ if self._is_correct(defect, action_str):
234
+ self._anomaly_flagged += 1
235
+
236
+ # Reward
237
+ rc = calculate_reward(
238
+ pcb=pcb,
239
+ action=action_str,
240
+ slots_state=self._slots,
241
+ is_anomaly=is_anomaly,
242
+ )
243
+ reward = rc.normalized
244
+ self._cumulative_reward += reward
245
+ self._all_rewards.append(reward)
246
+ self._reward_history.append(reward)
247
+
248
+ # Accuracy tracking
249
+ if self._is_correct(defect, action_str):
250
+ self._correct_count += 1
251
+
252
+ # Episode done?
253
+ max_boards = self.task_cfg["n_boards"]
254
+ done = (self._total_count >= max_boards)
255
+ self._done = done
256
+
257
+ # Prepare next PCB (for observation even if done)
258
+ if not done:
259
+ self._current_pcb = self._generate_pcb()
260
+ next_is_anomaly, next_anomaly_score = detect_anomaly(self._current_pcb)
261
+ else:
262
+ # Episode over — reuse last PCB for observation
263
+ next_is_anomaly, next_anomaly_score = is_anomaly, anomaly_score
264
+
265
+ obs = self._build_observation(next_is_anomaly, next_anomaly_score)
266
+
267
+ return StepResult(
268
+ observation=obs,
269
+ reward=reward,
270
+ reward_components=rc,
271
+ done=done,
272
+ info={
273
+ "action_taken": action_str,
274
+ "defect": defect,
275
+ "board_id": pcb["board_id"],
276
+ "is_anomaly": is_anomaly,
277
+ "anomaly_score": round(anomaly_score, 4),
278
+ "bottleneck_count": self._bottleneck_cnt,
279
+ "step": self._step_num,
280
+ "correct_count": self._correct_count,
281
+ "total_count": self._total_count,
282
+ },
283
+ )
284
+
285
+ def state(self) -> Dict[str, Any]:
286
+ """Return the full internal environment state as a dict."""
287
+ return {
288
+ "task_id": self.task_id,
289
+ "seed": self.seed,
290
+ "step": self._step_num,
291
+ "done": self._done,
292
+ "slots": self._get_slot_view(),
293
+ "free_slots": self._count_free_slots(),
294
+ "current_pcb": self._current_pcb,
295
+ "correct_count": self._correct_count,
296
+ "total_count": self._total_count,
297
+ "bottleneck_count": self._bottleneck_cnt,
298
+ "anomaly_total": self._anomaly_total,
299
+ "anomaly_flagged": self._anomaly_flagged,
300
+ "cumulative_reward": round(self._cumulative_reward, 4),
301
+ "reward_history": self._all_rewards,
302
+ "rolling_accuracy": round(self._correct_count / max(self._total_count, 1), 4),
303
+ "throughput": round(self._total_count / max(self._step_num, 1), 4),
304
+ }
305
+
306
+
307
+ # ---------------------------
308
+ # LEGACY COMPAT (for main.py / train.py / app.py)
309
+ # ---------------------------
310
+ # The old code imported module-level factory dict + generate_pcb / decide_action etc.
311
+ # We keep those here as thin wrappers so existing imports don't break.
312
+
313
+ _default_env = SpectraQualEnv("task_easy")
314
+
315
+ factory = {"soldering_slots": _default_env._slots}
316
+
317
+
318
+ def generate_pcb():
319
+ return _default_env._generate_pcb()
320
+
321
+
322
+ def update_factory():
323
+ _default_env._tick_slots()
324
+ factory["soldering_slots"] = _default_env._get_slot_view()
325
+
326
+
327
+ def assign_soldering_job():
328
+ return _default_env._assign_slot()
329
+
330
+
331
+ def decide_action(pcb):
332
+ """Legacy rule-based decision (used by main.py)."""
333
+ from config import VALID_ACTIONS
334
+ defect = pcb["defect_type"]
335
+ cost = pcb["component_cost"]
336
+ critical = pcb["criticality"]
337
+
338
+ if defect == "none":
339
+ return "PASS"
340
+ if defect == "missing_component":
341
+ return "ROUTE_COMPONENT_REPLACEMENT" if cost > 50 else "SCRAP"
342
+ if defect == "solder_bridge":
343
+ return "ROUTE_SOLDERING" if _default_env._count_free_slots() > 0 else "WAIT"
344
+ if defect == "short_circuit":
345
+ return "SCRAP" if critical > 0.7 else "ROUTE_DIAGNOSTICS"
346
+ return "SCRAP"
347
+
348
+
349
+ def calculate_reward_legacy(pcb, decision):
350
+ """Legacy single-float reward (used by train.py)."""
351
+ rc = calculate_reward(
352
+ pcb=pcb,
353
+ action=decision,
354
+ slots_state=_default_env._slots,
355
+ is_anomaly=False,
356
+ )
357
+ # Scale normalized [0,1] back to a range train.py expects
358
+ return (rc.normalized - 0.5) * 200
src/main.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from env import generate_pcb, decide_action, calculate_reward
2
+ from env import update_factory, factory
3
+
4
+ TOTAL_BOARDS = 10
5
+ total_score = 0
6
+
7
+ # Reset factory
8
+ factory["soldering_slots"] = [0, 0, 0]
9
+
10
+ for i in range(TOTAL_BOARDS):
11
+
12
+ print(f"\n--- TIME STEP {i+1} ---")
13
+
14
+ #Update factory (time passes)
15
+ update_factory()
16
+
17
+ pcb = generate_pcb()
18
+ decision = decide_action(pcb)
19
+ reward = calculate_reward(pcb, decision)
20
+
21
+ total_score += reward
22
+
23
+ print(f"PCB: {pcb}")
24
+ print(f"Decision: {decision}")
25
+ print(f"Reward: {round(reward,2)}")
26
+ print(f"Factory Slots: {factory['soldering_slots']}")
27
+
28
+ print("\n⚔️ Total Economic Score:", round(total_score,2))
src/models.py ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ models.py — SpectraQual Typed Pydantic Models
3
+ OpenEnv spec requires: typed Observation, Action, Reward models.
4
+ """
5
+
6
+ from __future__ import annotations
7
+ from typing import List, Literal, Optional, Dict, Any
8
+ from pydantic import BaseModel, Field
9
+
10
+
11
+ # ---------------------------
12
+ # PCB OBSERVATION
13
+ # ---------------------------
14
+ class PCBObservation(BaseModel):
15
+ """Observation returned after each reset() or step()."""
16
+
17
+ board_id: str = Field(..., description="Unique board identifier, e.g. SQ-4321")
18
+ defect_type: Literal[
19
+ "none", "missing_component", "solder_bridge", "short_circuit"
20
+ ] = Field(..., description="Type of defect detected on the PCB")
21
+ component_cost: float = Field(
22
+ ..., ge=10.0, le=200.0, description="Replacement cost of damaged component in ₹"
23
+ )
24
+ criticality: float = Field(
25
+ ..., ge=0.1, le=1.0, description="Risk score — higher means more critical circuit"
26
+ )
27
+ slots_free: int = Field(
28
+ ..., ge=0, description="Number of soldering slots currently available"
29
+ )
30
+ slots_state: List[int] = Field(
31
+ ..., description="Remaining time units for each soldering slot (0=free)"
32
+ )
33
+ is_anomaly: bool = Field(
34
+ False, description="True if this board exhibits rare/unusual characteristics"
35
+ )
36
+ anomaly_score: float = Field(
37
+ 0.0, ge=0.0, le=1.0, description="Anomaly confidence (0=normal, 1=highly anomalous)"
38
+ )
39
+ step: int = Field(..., ge=0, description="Current step number in the episode")
40
+ task_id: str = Field(..., description="ID of the active task")
41
+ valid_actions: List[str] = Field(
42
+ ..., description="List of valid actions for this observation"
43
+ )
44
+
45
+ # --- Real-time metrics ---
46
+ rolling_accuracy: float = Field(
47
+ 0.0, ge=0.0, le=1.0, description="Fraction of correct decisions so far"
48
+ )
49
+ throughput: float = Field(
50
+ 0.0, ge=0.0, description="Boards processed per step so far"
51
+ )
52
+ cumulative_reward: float = Field(
53
+ 0.0, description="Cumulative normalized reward so far in this episode"
54
+ )
55
+
56
+
57
+ # ---------------------------
58
+ # PCB ACTION
59
+ # ---------------------------
60
+ class PCBAction(BaseModel):
61
+ """Action submitted by an agent to the environment."""
62
+
63
+ action: Literal[
64
+ "PASS",
65
+ "SCRAP",
66
+ "ROUTE_COMPONENT_REPLACEMENT",
67
+ "ROUTE_SOLDERING",
68
+ "ROUTE_DIAGNOSTICS",
69
+ "WAIT",
70
+ ] = Field(..., description="Decision made for the current PCB")
71
+
72
+
73
+ # ---------------------------
74
+ # REWARD COMPONENTS
75
+ # ---------------------------
76
+ class RewardComponents(BaseModel):
77
+ """Decomposed reward signal for transparency and debugging."""
78
+
79
+ defect_reward: float = Field(
80
+ ..., description="Score for handling the defect correctly (0.0–1.0)"
81
+ )
82
+ cost_efficiency: float = Field(
83
+ ..., description="Economic value retained vs. lost (0.0–1.0)"
84
+ )
85
+ queue_penalty: float = Field(
86
+ ..., description="Penalty for creating factory bottlenecks (0.0–1.0, lower is worse)"
87
+ )
88
+ criticality_factor: float = Field(
89
+ ..., description="Risk-adjusted modifier based on criticality (0.0–1.0)"
90
+ )
91
+ anomaly_bonus: float = Field(
92
+ 0.0, description="Bonus for correctly flagging/handling anomalous board (0.0–1.0)"
93
+ )
94
+ total_raw: float = Field(
95
+ ..., description="Weighted sum of all components before normalization"
96
+ )
97
+ normalized: float = Field(
98
+ ..., ge=0.0, le=1.0, description="Final normalized reward in [0.0, 1.0]"
99
+ )
100
+ explanation: str = Field(
101
+ ..., description="Human-readable explanation of why this reward was given"
102
+ )
103
+
104
+
105
+ # ---------------------------
106
+ # STEP RESULT
107
+ # ---------------------------
108
+ class StepResult(BaseModel):
109
+ """Full result returned by step() and reset()."""
110
+
111
+ observation: PCBObservation
112
+ reward: float = Field(
113
+ 0.0, ge=0.0, le=1.0, description="Normalized reward for this step [0.0, 1.0]"
114
+ )
115
+ reward_components: Optional[RewardComponents] = Field(
116
+ None, description="Detailed breakdown of reward components"
117
+ )
118
+ done: bool = Field(..., description="True if the episode has ended")
119
+ info: Dict[str, Any] = Field(
120
+ default_factory=dict, description="Additional diagnostic info"
121
+ )
122
+
123
+
124
+ # ---------------------------
125
+ # TASK RESULT (for graders)
126
+ # ---------------------------
127
+ class TaskResult(BaseModel):
128
+ """Summary of a completed task run, consumed by graders."""
129
+
130
+ task_id: str
131
+ total_steps: int
132
+ rewards: List[float] # per-step normalized rewards
133
+ correct_decisions: int
134
+ total_decisions: int
135
+ bottleneck_count: int # times queue was maxed out
136
+ anomaly_total: int # how many anomaly boards appeared
137
+ anomaly_flagged: int # how many the agent correctly flagged
138
+ cumulative_raw_reward: float
139
+ max_possible_raw: float
140
+ final_score: float = 0.0 # filled by grader
src/reward.py ADDED
@@ -0,0 +1,288 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ reward.py — SpectraQual Multi-Component Normalized Reward
3
+ Replaces duplicated logic in env.py and old reward.py.
4
+
5
+ Reward is decomposed into 5 components and normalized to [0.0, 1.0].
6
+ This gives the agent a rich, non-sparse signal at every step.
7
+ """
8
+
9
+ from __future__ import annotations
10
+ import math
11
+ from typing import Dict, Any, List
12
+
13
+ from config import (
14
+ REWARD_WEIGHT_DEFECT,
15
+ REWARD_WEIGHT_COST,
16
+ REWARD_WEIGHT_QUEUE,
17
+ REWARD_WEIGHT_CRITICALITY,
18
+ REWARD_WEIGHT_ANOMALY,
19
+ COMPONENT_COST_MIN,
20
+ COMPONENT_COST_MAX,
21
+ ANOMALY_COST_THRESHOLD,
22
+ ANOMALY_CRITICALITY_THRESHOLD,
23
+ )
24
+ from models import RewardComponents
25
+
26
+
27
+ # ---------------------------
28
+ # NORMALIZATION HELPERS
29
+ # ---------------------------
30
+ def _sigmoid_normalize(x: float, scale: float = 0.025) -> float:
31
+ """Sigmoid-based normalization: output is always in (0, 1)."""
32
+ return 1.0 / (1.0 + math.exp(-scale * x))
33
+
34
+
35
+ def _clamp(x: float, lo: float = 0.0, hi: float = 1.0) -> float:
36
+ return max(lo, min(hi, x))
37
+
38
+
39
+ def _cost_fraction(cost: float) -> float:
40
+ """Normalize cost into [0, 1] range."""
41
+ return (cost - COMPONENT_COST_MIN) / (COMPONENT_COST_MAX - COMPONENT_COST_MIN)
42
+
43
+
44
+ # ---------------------------
45
+ # ANOMALY DETECTION
46
+ # ---------------------------
47
+ def detect_anomaly(pcb: Dict[str, Any]) -> tuple[bool, float]:
48
+ """
49
+ Flag a board as an anomaly if it has extreme cost AND high criticality.
50
+ Returns (is_anomaly, anomaly_score 0.0–1.0).
51
+ """
52
+ cost_flag = pcb["component_cost"] >= ANOMALY_COST_THRESHOLD
53
+ critical_flag = pcb["criticality"] >= ANOMALY_CRITICALITY_THRESHOLD
54
+
55
+ if cost_flag and critical_flag:
56
+ # Combine both signals into a confidence score
57
+ cost_score = _cost_fraction(pcb["component_cost"])
58
+ critical_score = pcb["criticality"]
59
+ anomaly_score = _clamp(0.5 * cost_score + 0.5 * critical_score)
60
+ return True, anomaly_score
61
+
62
+ # Partial anomaly: one signal strong
63
+ if cost_flag or critical_flag:
64
+ score = _cost_fraction(pcb["component_cost"]) * 0.4 + pcb["criticality"] * 0.3
65
+ return False, _clamp(score)
66
+
67
+ return False, 0.0
68
+
69
+
70
+ # ---------------------------
71
+ # COMPONENT 1 — DEFECT REWARD
72
+ # ---------------------------
73
+ def _defect_component(defect: str, action: str) -> tuple[float, str]:
74
+ """
75
+ Score the correctness of the action given the defect type.
76
+ Returns (raw_score 0.0–1.0, explanation_fragment)
77
+ """
78
+ mapping = {
79
+ ("none", "PASS"): (1.00, "Correct PASS on clean board"),
80
+ ("none", "SCRAP"): (0.00, "Wasteful SCRAP on clean board"),
81
+ ("missing_component", "ROUTE_COMPONENT_REPLACEMENT"): (1.00, "Optimal route for missing component"),
82
+ ("missing_component", "SCRAP"): (0.30, "Suboptimal SCRAP — value lost"),
83
+ ("solder_bridge", "ROUTE_SOLDERING"): (1.00, "Correct soldering route"),
84
+ ("solder_bridge", "WAIT"): (0.40, "WAIT acceptable — preserves board"),
85
+ ("solder_bridge", "SCRAP"): (0.10, "Poor choice — solder bridge is repairable"),
86
+ ("short_circuit", "SCRAP"): (1.00, "Correct SCRAP for high-risk short circuit"),
87
+ ("short_circuit", "ROUTE_DIAGNOSTICS"): (0.80, "Diagnostics acceptable for low-risk short"),
88
+ ("short_circuit", "PASS"): (0.00, "Dangerous PASS on short circuit"),
89
+ }
90
+ key = (defect, action)
91
+ if key in mapping:
92
+ score, expl = mapping[key]
93
+ return score, expl
94
+ # Any other invalid combination
95
+ return 0.05, f"Invalid action '{action}' for defect '{defect}'"
96
+
97
+
98
+ # ---------------------------
99
+ # COMPONENT 2 — COST EFFICIENCY
100
+ # ---------------------------
101
+ def _cost_component(defect: str, action: str, cost: float) -> tuple[float, str]:
102
+ """
103
+ Measure economic efficiency of the decision.
104
+ Returns (score 0.0–1.0, explanation_fragment)
105
+ """
106
+ cf = _cost_fraction(cost)
107
+
108
+ if defect == "none":
109
+ return (1.0, "No cost involved in PASS") if action == "PASS" else (0.5, "Unnecessary action cost")
110
+
111
+ if defect == "missing_component":
112
+ if action == "ROUTE_COMPONENT_REPLACEMENT":
113
+ # High-cost boards benefit more from repair
114
+ return (_clamp(0.5 + 0.5 * cf), f"Repair recovers {cf:.0%} of component value")
115
+ else: # SCRAP
116
+ # Scrapping expensive boards wastes value
117
+ return (_clamp(1.0 - cf), f"Scrap wastes {cf:.0%} of component value")
118
+
119
+ if defect == "solder_bridge":
120
+ if action == "ROUTE_SOLDERING":
121
+ return (_clamp(0.6 + 0.3 * cf), "Soldering route recovers board value")
122
+ elif action == "WAIT":
123
+ return (0.45, "WAIT preserves board but delays throughput")
124
+ else: # SCRAP
125
+ return (_clamp(0.5 - 0.4 * cf), "Scrapping repairable board is costly")
126
+
127
+ if defect == "short_circuit":
128
+ if action == "SCRAP":
129
+ return (0.80, "Scrapping avoids downstream failure cost")
130
+ elif action == "ROUTE_DIAGNOSTICS":
131
+ return (0.70, "Diagnostics adds some cost but recovers revenue")
132
+ else:
133
+ return (0.10, "Wrong action risks high downstream failure penalty")
134
+
135
+ return (0.3, "Unknown defect/action combination")
136
+
137
+
138
+ # ---------------------------
139
+ # COMPONENT 3 — QUEUE PENALTY
140
+ # ---------------------------
141
+ def _queue_component(action: str, slots_state: List[int]) -> tuple[float, str]:
142
+ """
143
+ Penalize bottleneck creation. Returns (score 0.0–1.0, explanation_fragment).
144
+ High score = no queue problem. Low score = bad queue usage.
145
+ """
146
+ free_slots = slots_state.count(0)
147
+ total_slots = len(slots_state)
148
+
149
+ if action == "ROUTE_SOLDERING":
150
+ if free_slots > 0:
151
+ utilization = 1.0 - (free_slots - 1) / total_slots
152
+ return (_clamp(0.6 + 0.4 * utilization),
153
+ f"Soldering assigned to free slot ({free_slots - 1} remaining)")
154
+ else:
155
+ # All slots full → bottleneck
156
+ return (0.0, "BOTTLENECK: all soldering slots occupied")
157
+
158
+ if action == "WAIT":
159
+ if free_slots == 0:
160
+ return (0.55, "WAIT appropriate — no slot available")
161
+ else:
162
+ return (0.35, "Unnecessary WAIT — slots were available")
163
+
164
+ # Non-soldering actions don't stress the queue
165
+ occupancy_ratio = sum(1 for s in slots_state if s > 0) / total_slots
166
+ return (_clamp(1.0 - 0.2 * occupancy_ratio), "No queue impact from this action")
167
+
168
+
169
+ # ---------------------------
170
+ # COMPONENT 4 — CRITICALITY
171
+ # ---------------------------
172
+ def _criticality_component(defect: str, action: str, criticality: float) -> tuple[float, str]:
173
+ """
174
+ Risk-adjust the decision based on board criticality.
175
+ High-criticality wrong decisions are severely penalized.
176
+ """
177
+ # Optimal action scores well regardless of criticality
178
+ optimal = {
179
+ "none": "PASS",
180
+ "missing_component": "ROUTE_COMPONENT_REPLACEMENT",
181
+ "solder_bridge": "ROUTE_SOLDERING",
182
+ "short_circuit": "SCRAP",
183
+ }
184
+ is_optimal = (optimal.get(defect) == action)
185
+
186
+ if is_optimal:
187
+ # Reward scales slightly with criticality — making the right call on risky boards is harder
188
+ return (_clamp(0.7 + 0.3 * criticality), f"Correct action on criticality={criticality:.2f} board")
189
+
190
+ if defect == "short_circuit" and action not in ("SCRAP", "ROUTE_DIAGNOSTICS"):
191
+ # Dangerous wrong action on high-criticality board
192
+ penalty = criticality
193
+ return (_clamp(1.0 - penalty), f"Risky action on critical short_circuit board (criticality={criticality:.2f})")
194
+
195
+ # Sub-optimal but not dangerous
196
+ return (_clamp(0.5 - 0.2 * criticality), f"Sub-optimal action with criticality={criticality:.2f}")
197
+
198
+
199
+ # ---------------------------
200
+ # COMPONENT 5 — ANOMALY BONUS
201
+ # ---------------------------
202
+ def _anomaly_component(is_anomaly: bool, action: str, defect: str) -> tuple[float, str]:
203
+ """
204
+ Bonus for handling anomalous boards correctly.
205
+ For inference.py the LLM can't explicitly 'flag' anomalies, so we reward
206
+ it for choosing the safest action on anomaly boards.
207
+ """
208
+ if not is_anomaly:
209
+ return (0.5, "Normal board — no anomaly bonus/penalty")
210
+
211
+ # Best safe action on anomaly board
212
+ safe_actions = {
213
+ "none": "PASS",
214
+ "missing_component": "ROUTE_COMPONENT_REPLACEMENT",
215
+ "solder_bridge": "ROUTE_SOLDERING",
216
+ "short_circuit": "SCRAP",
217
+ }
218
+ if action == safe_actions.get(defect):
219
+ return (1.0, "Correct safe action on anomaly board — BONUS")
220
+ elif action == "SCRAP":
221
+ return (0.6, "Conservative SCRAP on anomaly board")
222
+ else:
223
+ return (0.1, "Risky action on anomaly board — PENALTY")
224
+
225
+
226
+ # ---------------------------
227
+ # MASTER REWARD CALCULATOR
228
+ # ---------------------------
229
+ def calculate_reward(
230
+ pcb: Dict[str, Any],
231
+ action: str,
232
+ slots_state: List[int],
233
+ is_anomaly: bool = False,
234
+ ) -> RewardComponents:
235
+ """
236
+ Compute multi-component normalized reward for a (pcb, action) pair.
237
+
238
+ Args:
239
+ pcb: dict with defect_type, component_cost, criticality
240
+ action: one of the 6 valid action strings
241
+ slots_state: list of slot remaining times, e.g. [0, 2, 0]
242
+ is_anomaly: whether this board was flagged as anomalous
243
+
244
+ Returns:
245
+ RewardComponents with individual scores and final normalized reward.
246
+ """
247
+ defect = pcb["defect_type"]
248
+ cost = pcb["component_cost"]
249
+ criticality = pcb["criticality"]
250
+
251
+ # Compute each component
252
+ d_score, d_expl = _defect_component(defect, action)
253
+ c_score, c_expl = _cost_component(defect, action, cost)
254
+ q_score, q_expl = _queue_component(action, slots_state)
255
+ r_score, r_expl = _criticality_component(defect, action, criticality)
256
+ a_score, a_expl = _anomaly_component(is_anomaly, action, defect)
257
+
258
+ # Weighted sum
259
+ raw = (
260
+ REWARD_WEIGHT_DEFECT * d_score +
261
+ REWARD_WEIGHT_COST * c_score +
262
+ REWARD_WEIGHT_QUEUE * q_score +
263
+ REWARD_WEIGHT_CRITICALITY * r_score +
264
+ REWARD_WEIGHT_ANOMALY * a_score
265
+ )
266
+ normalized = _clamp(raw)
267
+
268
+ # Build explanation
269
+ parts = [
270
+ f"[Defect {d_score:.2f}] {d_expl}",
271
+ f"[Cost {c_score:.2f}] {c_expl}",
272
+ f"[Queue {q_score:.2f}] {q_expl}",
273
+ f"[Risk {r_score:.2f}] {r_expl}",
274
+ ]
275
+ if is_anomaly:
276
+ parts.append(f"[Anomaly {a_score:.2f}] {a_expl}")
277
+ explanation = " | ".join(parts)
278
+
279
+ return RewardComponents(
280
+ defect_reward=round(d_score, 4),
281
+ cost_efficiency=round(c_score, 4),
282
+ queue_penalty=round(q_score, 4),
283
+ criticality_factor=round(r_score, 4),
284
+ anomaly_bonus=round(a_score, 4),
285
+ total_raw=round(raw, 4),
286
+ normalized=round(normalized, 4),
287
+ explanation=explanation,
288
+ )
src/tasks.py ADDED
@@ -0,0 +1,262 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ tasks.py — SpectraQual Task Definitions and Programmatic Graders
3
+ Each task runs the environment with a fixed seed and scores the agent 0.0–1.0.
4
+ Graders are deterministic and reproducible.
5
+ """
6
+
7
+ from __future__ import annotations
8
+ import sys
9
+ import os
10
+ from typing import List
11
+
12
+ sys.path.insert(0, os.path.dirname(__file__))
13
+
14
+ from config import (
15
+ TASKS,
16
+ MEDIUM_ECONOMIC_TARGET,
17
+ HARD_ANOMALY_RATE_TARGET,
18
+ SUCCESS_SCORE_THRESHOLD,
19
+ )
20
+ from models import TaskResult
21
+ from env import SpectraQualEnv
22
+ from models import PCBAction
23
+
24
+
25
+ # ---------------------------
26
+ # TASK RUNNER
27
+ # ---------------------------
28
+ def run_task(task_id: str, actions: List[str]) -> TaskResult:
29
+ """
30
+ Run a task with a pre-determined list of actions.
31
+ Used by graders to replay an agent's trajectory deterministically.
32
+
33
+ Args:
34
+ task_id: one of "task_easy", "task_medium", "task_hard"
35
+ actions: list of action strings, one per step
36
+
37
+ Returns:
38
+ TaskResult with all episode metrics filled in.
39
+ """
40
+ cfg = TASKS[task_id]
41
+ env = SpectraQualEnv(task_id=task_id)
42
+ env.reset()
43
+
44
+ rewards: List[float] = []
45
+ correct = 0
46
+ total = 0
47
+ bottlenecks = 0
48
+ anomaly_total = 0
49
+ anomaly_flagged = 0
50
+ cum_raw = 0.0
51
+
52
+ for i, action_str in enumerate(actions):
53
+ if env._done:
54
+ break
55
+
56
+ # Default to SCRAP if action is out of valid range
57
+ valid = env._current_pcb and env._current_pcb.get("defect_type")
58
+ try:
59
+ result = env.step(PCBAction(action=action_str))
60
+ except Exception:
61
+ result = env.step(PCBAction(action="SCRAP"))
62
+
63
+ rewards.append(result.reward)
64
+ total += 1
65
+ if result.info.get("is_anomaly"):
66
+ anomaly_total += 1
67
+ if result.reward_components:
68
+ cum_raw += result.reward_components.total_raw
69
+ if result.info.get("is_anomaly") and result.reward_components.anomaly_bonus >= 0.8:
70
+ anomaly_flagged += 1
71
+
72
+ if env._is_correct(result.info.get("defect", ""), action_str):
73
+ correct += 1
74
+
75
+ bottlenecks = env._bottleneck_cnt
76
+
77
+ max_possible_raw = cfg["n_boards"] * 1.0 # max normalized = 1.0 per step
78
+
79
+ return TaskResult(
80
+ task_id=task_id,
81
+ total_steps=total,
82
+ rewards=rewards,
83
+ correct_decisions=correct,
84
+ total_decisions=total,
85
+ bottleneck_count=bottlenecks,
86
+ anomaly_total=anomaly_total,
87
+ anomaly_flagged=anomaly_flagged,
88
+ cumulative_raw_reward=cum_raw,
89
+ max_possible_raw=max_possible_raw,
90
+ )
91
+
92
+
93
+ # ---------------------------
94
+ # GRADER: TASK EASY
95
+ # ---------------------------
96
+ def grade_easy(result: TaskResult) -> float:
97
+ """
98
+ Task Easy Grader.
99
+ Objective: Correctly classify all defect types. No slot pressure.
100
+ Scoring: correct_decisions / total_decisions → 0.0–1.0
101
+
102
+ Also gives partial credit for near-correct results:
103
+ - 100% correct = 1.0
104
+ - 80% correct = 0.8
105
+ - 0% correct = 0.0
106
+ """
107
+ if result.total_decisions == 0:
108
+ return 0.0
109
+
110
+ accuracy = result.correct_decisions / result.total_decisions
111
+
112
+ # Blend accuracy with average reward for robustness
113
+ avg_reward = sum(result.rewards) / len(result.rewards) if result.rewards else 0.0
114
+
115
+ # Weight: 70% accuracy, 30% reward quality
116
+ score = 0.70 * accuracy + 0.30 * avg_reward
117
+ return round(min(max(score, 0.0), 1.0), 4)
118
+
119
+
120
+ # ---------------------------
121
+ # GRADER: TASK MEDIUM
122
+ # ---------------------------
123
+ def grade_medium(result: TaskResult) -> float:
124
+ """
125
+ Task Medium Grader.
126
+ Objective: Triage 15 boards with 1 slot (queue pressure).
127
+ Scoring: 0.6 * economic_efficiency + 0.4 * bottleneck_avoidance
128
+
129
+ - economic_efficiency: avg normalized reward vs target
130
+ - bottleneck_avoidance: 1.0 if no bottlenecks, scales down to 0
131
+ """
132
+ if not result.rewards:
133
+ return 0.0
134
+
135
+ avg_reward = sum(result.rewards) / len(result.rewards)
136
+
137
+ # Economic efficiency: how close to target (MEDIUM_ECONOMIC_TARGET = 0.50)
138
+ economic_score = min(avg_reward / MEDIUM_ECONOMIC_TARGET, 1.0)
139
+
140
+ # Bottleneck avoidance: 0 bottleneck = 1.0, ≥5 = 0.0
141
+ max_tolerable_bottlenecks = 5
142
+ bottleneck_score = max(0.0, 1.0 - result.bottleneck_count / max_tolerable_bottlenecks)
143
+
144
+ score = 0.60 * economic_score + 0.40 * bottleneck_score
145
+ return round(min(max(score, 0.0), 1.0), 4)
146
+
147
+
148
+ # ---------------------------
149
+ # GRADER: TASK HARD
150
+ # ---------------------------
151
+ def grade_hard(result: TaskResult) -> float:
152
+ """
153
+ Task Hard Grader.
154
+ Objective: 20 boards, mixed anomalies, tight slots.
155
+ Scoring: 0.5 * anomaly_score + 0.3 * economic_score + 0.2 * throughput_score
156
+
157
+ - anomaly_score: anomaly_flagged / max(anomaly_total, 1), target ≥ 0.5
158
+ - economic_score: avg normalized reward
159
+ - throughput_score: boards_processed / total (penalizes WAIT spam)
160
+ """
161
+ if not result.rewards:
162
+ return 0.0
163
+
164
+ cfg = TASKS["task_hard"]
165
+ avg_reward = sum(result.rewards) / len(result.rewards)
166
+
167
+ # Anomaly score: did the agent handle anomalous boards correctly?
168
+ if result.anomaly_total > 0:
169
+ raw_anomaly = result.anomaly_flagged / result.anomaly_total
170
+ else:
171
+ raw_anomaly = 1.0 # no anomalies → not penalized
172
+
173
+ # Scale anomaly score: meeting HARD_ANOMALY_RATE_TARGET = 1.0
174
+ anomaly_score = min(raw_anomaly / HARD_ANOMALY_RATE_TARGET, 1.0)
175
+
176
+ # Economic score
177
+ economic_score = avg_reward
178
+
179
+ # Throughput: penalize excessive WAIT actions
180
+ throughput_score = min(result.total_decisions / cfg["n_boards"], 1.0)
181
+
182
+ score = (
183
+ 0.50 * anomaly_score +
184
+ 0.30 * economic_score +
185
+ 0.20 * throughput_score
186
+ )
187
+ return round(min(max(score, 0.0), 1.0), 4)
188
+
189
+
190
+ # ---------------------------
191
+ # GRADER DISPATCH
192
+ # ---------------------------
193
+ GRADERS = {
194
+ "task_easy": grade_easy,
195
+ "task_medium": grade_medium,
196
+ "task_hard": grade_hard,
197
+ }
198
+
199
+
200
+ def grade(task_id: str, result: TaskResult) -> float:
201
+ """Dispatch to the correct grader for the given task_id."""
202
+ if task_id not in GRADERS:
203
+ raise ValueError(f"No grader for task_id='{task_id}'")
204
+ return GRADERS[task_id](result)
205
+
206
+
207
+ # ---------------------------
208
+ # TASK DESCRIPTIONS (for README / inference prompt)
209
+ # ---------------------------
210
+ TASK_DESCRIPTIONS = {
211
+ "task_easy": (
212
+ "Triage 10 PCBs with no factory slot pressure. "
213
+ "Focus: identify the correct action for each defect type. "
214
+ "Grader: accuracy-weighted reward (70% accuracy + 30% reward quality). "
215
+ "Expected frontier model score: ≥0.85."
216
+ ),
217
+ "task_medium": (
218
+ "Triage 15 PCBs with only 1 active soldering slot. "
219
+ "Focus: manage queue pressure while maintaining economic performance. "
220
+ "Grader: 60% economic efficiency + 40% bottleneck avoidance. "
221
+ "Expected frontier model score: ≥0.65."
222
+ ),
223
+ "task_hard": (
224
+ "Triage 20 PCBs with 25% anomaly rate and tight slot constraints. "
225
+ "Focus: handle extreme-cost/criticality boards safely AND maintain throughput. "
226
+ "Grader: 50% anomaly handling + 30% economic score + 20% throughput. "
227
+ "Expected frontier model score: ≥0.50."
228
+ ),
229
+ }
230
+
231
+
232
+ # ---------------------------
233
+ # CLI TEST UTILITY
234
+ # ---------------------------
235
+ if __name__ == "__main__":
236
+ """Quick sanity check: run all 3 tasks with a rule-based agent."""
237
+ from env import SpectraQualEnv, decide_action
238
+ from models import PCBAction
239
+
240
+ print("\n=== SpectraQual Task Grader Sanity Check ===\n")
241
+
242
+ for tid in ["task_easy", "task_medium", "task_hard"]:
243
+ env = SpectraQualEnv(task_id=tid)
244
+ result_obj = env.reset()
245
+ actions = []
246
+
247
+ while not result_obj.done:
248
+ obs = result_obj.observation
249
+ pcb = {
250
+ "defect_type": obs.defect_type,
251
+ "component_cost": obs.component_cost,
252
+ "criticality": obs.criticality,
253
+ }
254
+ action_str = decide_action(pcb)
255
+ actions.append(action_str)
256
+ result_obj = env.step(PCBAction(action=action_str))
257
+
258
+ task_result = run_task(tid, actions)
259
+ score = grade(tid, task_result)
260
+ print(f"[{tid}] Score: {score:.4f} | Correct: {task_result.correct_decisions}/{task_result.total_decisions} | Bottlenecks: {task_result.bottleneck_count}")
261
+
262
+ print("\n=== Done ===")
src/train.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from env import generate_pcb, calculate_reward, update_factory, factory
2
+ from agent import get_state, choose_action, update_q
3
+
4
+ EPISODES = 500
5
+ STEPS_PER_EPISODE = 20 # multi-step episodes
6
+
7
+ for ep in range(EPISODES):
8
+
9
+ factory["soldering_slots"] = [0, 0, 0]
10
+
11
+ pcb = generate_pcb()
12
+ state = get_state(pcb, factory)
13
+
14
+ for step in range(STEPS_PER_EPISODE):
15
+
16
+ action = choose_action(state)
17
+
18
+ update_factory()
19
+
20
+ reward = calculate_reward(pcb, action)
21
+
22
+ next_pcb = generate_pcb()
23
+ next_state = get_state(next_pcb, factory)
24
+
25
+ update_q(state, action, reward, next_state)
26
+
27
+ # move forward
28
+ pcb = next_pcb
29
+ state = next_state
30
+
31
+ print("Training Complete")
verify.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ sys.path.insert(0, 'src')
3
+
4
+ from config import TASKS, ACTIONS, VALID_ACTIONS
5
+ from models import PCBObservation, PCBAction, RewardComponents, StepResult
6
+ from reward import calculate_reward, detect_anomaly
7
+ from env import SpectraQualEnv
8
+
9
+ print("--- Module imports: OK ---")
10
+
11
+ # Test reset and step
12
+ env = SpectraQualEnv("task_easy")
13
+ r = env.reset()
14
+ print(f"reset() -> defect={r.observation.defect_type}, step={r.observation.step}, done={r.done}")
15
+
16
+ action = r.observation.valid_actions[0]
17
+ r2 = env.step(PCBAction(action=action))
18
+ print(f"step({action}) -> reward={r2.reward:.4f}, done={r2.done}")
19
+ print(f" expl: {r2.reward_components.explanation[:80]}")
20
+
21
+ state = env.state()
22
+ print(f"state() -> step={state['step']}, accuracy={state['rolling_accuracy']}")
23
+
24
+ # Test all 3 tasks
25
+ for tid in ["task_easy", "task_medium", "task_hard"]:
26
+ e = SpectraQualEnv(task_id=tid)
27
+ rr = e.reset()
28
+ steps = 0
29
+ while not rr.done and steps < 30:
30
+ action_str = rr.observation.valid_actions[0]
31
+ rr = e.step(PCBAction(action=action_str))
32
+ steps += 1
33
+ s = e.state()
34
+ print(f"[{tid}] steps={steps}, cum_reward={s['cumulative_reward']:.4f}, accuracy={s['rolling_accuracy']:.2%}")
35
+
36
+ print("--- All tests: PASS ---")