Spaces:

taarunforge
/

spectraqual

Sleeping

App Files Files Community

taarunforge commited on Apr 8

Commit

dfbb493

1 Parent(s): bd03bab

Deploy SpectraQual OpenEnv environment

Browse files

Files changed (24) hide show

Dockerfile +37 -0
README.md +232 -9
inference.py +293 -0
openenv.yaml +93 -0
requirements.txt +21 -0
src/__pycache__/agent.cpython-314.pyc +0 -0
src/__pycache__/app.cpython-314.pyc +0 -0
src/__pycache__/config.cpython-314.pyc +0 -0
src/__pycache__/env.cpython-314.pyc +0 -0
src/__pycache__/environment.cpython-314.pyc +0 -0
src/__pycache__/models.cpython-314.pyc +0 -0
src/__pycache__/reward.cpython-314.pyc +0 -0
src/__pycache__/tasks.cpython-314.pyc +0 -0
src/agent.py +67 -0
src/api.py +58 -0
src/app.py +674 -0
src/config.py +116 -0
src/env.py +358 -0
src/main.py +28 -0
src/models.py +140 -0
src/reward.py +288 -0
src/tasks.py +262 -0
src/train.py +31 -0
verify.py +36 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,37 @@

+# ── Base image ───────────────────────────────────────────────────────────────
+FROM python:3.11-slim
+# ── Metadata ─────────────────────────────────────────────────────────────────
+LABEL maintainer="SpectraQual Team"
+LABEL description="SpectraQual — PCB Quality Control OpenEnv Environment"
+LABEL version="1.0.0"
+# ── System deps ───────────────────────────────────────────────────────────────
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+# ── Working directory ─────────────────────────────────────────────────────────
+WORKDIR /app
+# ── Install Python dependencies first (layer cache) ──────────────────────────
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# ── Copy source code ──────────────────────────────────────────────────────────
+COPY . .
+# ── Environment variables (overridden at runtime) ─────────────────────────────
+ENV API_BASE_URL="https://openrouter.ai/api/v1"
+ENV MODEL_NAME="meta-llama/llama-3.3-70b-instruct"
+ENV HF_TOKEN=""
+# ── Expose API port (HF Spaces default) ───────────────────────────────────────
+EXPOSE 7860
+# ── Health check ──────────────────────────────────────────────────────────────
+HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
+    CMD curl -f http://localhost:7860/ || exit 1
+# ── Default command: launch FastAPI server ───────────────────────────────
+CMD ["uvicorn", "src.api:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,12 +1,235 @@
 ---
-title: Spectraqual
-emoji: 😻
-colorFrom: pink
-colorTo: indigo
-sdk: docker
-pinned: false
-license: mit
-short_description: "PCB quality-control triage OpenEnv environment\t"
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# SpectraQual — PCB Smart Quality-Control OpenEnv Environment
+[![OpenEnv](https://img.shields.io/badge/OpenEnv-Compliant-00e5ff?style=flat-square)](https://github.com/openenv)
+[![Python](https://img.shields.io/badge/Python-3.11-blue?style=flat-square)](https://python.org)
+[![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE)
+**SpectraQual** is a real-world AI environment that simulates smart quality-control triage for Printed Circuit Boards (PCBs) in a manufacturing factory.
+An AI agent receives a stream of PCBs, each with a different defect type, component cost, and criticality score. The agent must choose the optimal economic action (Pass, Scrap, Route to Repair, Wait) while managing a shared factory soldering slot queue.
+> **Why this problem matters:** PCB triage is a real, high-stakes manufacturing task. Wrong decisions mean wasted boards, bottlenecked production lines, and downstream electronics failures. SpectraQual models this as an RL environment where an agent must balance economic value, operational constraints, and risk — a setting where LLM agents can be meaningfully evaluated.
+---
+##  Environment Overview
+| Property | Value |
+|---|---|
+| **Domain** | Smart Manufacturing / Industrial AI |
+| **Tasks** | 3 (Easy → Hard) |
+| **Action Space** | 6 discrete actions |
+| **Observation Space** | 13 fields (typed Pydantic model) |
+| **Reward Range** | `[0.0, 1.0]` normalized |
+| **Reward Signal** | Dense (per-step), 5 components |
+| **Seeded / Reproducible** | Yes |
+| **Anomaly Detection** | Yes |
+| **OpenEnv Spec** | Compliant |
 ---
+##  Action Space
+| Action | Description | Valid When |
+|---|---|---|
+| `PASS` | Clear the board — no defect | `defect_type = none` |
+| `SCRAP` | Discard the board | Any defect |
+| `ROUTE_COMPONENT_REPLACEMENT` | Send to component repair | `missing_component` |
+| `ROUTE_SOLDERING` | Send to soldering station | `solder_bridge` |
+| `ROUTE_DIAGNOSTICS` | Send for investigation | `short_circuit` |
+| `WAIT` | Hold board until slot free | `solder_bridge` (no slot) |
 ---
+##  Observation Space
+```python
+class PCBObservation(BaseModel):
+    board_id: str                   # Unique PCB ID (e.g. "SQ-4321")
+    defect_type: str                # "none" | "missing_component" | "solder_bridge" | "short_circuit"
+    component_cost: float           # Replacement cost ₹10–200
+    criticality: float              # Risk score 0.1–1.0
+    slots_free: int                 # Available soldering slots
+    slots_state: List[int]          # Remaining time per slot (0=free, -1=locked)
+    is_anomaly: bool                # True if board is rare/extreme
+    anomaly_score: float            # Anomaly confidence 0.0–1.0
+    valid_actions: List[str]        # Permitted actions for this defect
+    rolling_accuracy: float         # Fraction of correct decisions so far
+    throughput: float               # Boards/step so far
+    cumulative_reward: float        # Episode cumulative reward
+    step: int                       # Current step number
+```
+---
+##  Reward Function
+Reward is **dense** (given every step) and **decomposed into 5 interpretable components**, all normalized to `[0.0, 1.0]`:
+| Component | Weight | Description |
+|---|---|---|
+| `defect_reward` | 35% | Correctness of the action for the defect type |
+| `cost_efficiency` | 25% | Economic value retained vs. lost |
+| `queue_penalty` | 20% | Factory bottleneck avoidance |
+| `criticality_factor` | 10% | Risk-adjusted multiplier |
+| `anomaly_bonus` | 10% | Correct handling of anomalous boards |
+**Final reward** = weighted sum of all 5 components, clamped to `[0.0, 1.0]`.
+Every `StepResult` includes a full `RewardComponents` object with an `explanation` field explaining why the reward was given — enabling full explainability.
+---
+##  Tasks
+### Task Easy (`task_easy`)
+- **Boards:** 10 | **Seed:** 42 | **Slots:** 3 | **Anomaly Rate:** 0%
+- **Objective:** Correctly classify all defect types. No slot pressure.
+- **Grader:** `0.70 × accuracy + 0.30 × avg_reward`
+- **Expected frontier model score:** ≥ 0.85
+### Task Medium (`task_medium`)
+- **Boards:** 15 | **Seed:** 99 | **Slots:** 1 | **Anomaly Rate:** 10%
+- **Objective:** Triage boards with one soldering slot — manage queue pressure.
+- **Grader:** `0.60 × economic_efficiency + 0.40 × bottleneck_avoidance`
+- **Expected frontier model score:** ≥ 0.65
+### Task Hard (`task_hard`)
+- **Boards:** 20 | **Seed:** 777 | **Slots:** 1 | **Anomaly Rate:** 25%
+- **Objective:** Handle anomalous boards safely AND maintain throughput with tight slots.
+- **Grader:** `0.50 × anomaly_score + 0.30 × economic_score + 0.20 × throughput_score`
+- **Expected frontier model score:** ≥ 0.50
+---
+##  Setup & Usage
+### Prerequisites
+```bash
+Python >= 3.11
+pip install -r requirements.txt
+```
+### 1) Launch the Streamlit Dashboard
+```bash
+streamlit run src/app.py
+```
+### 2) Run the LLM Inference Script
+```bash
+# Set environment variables
+export API_BASE_URL="https://openrouter.ai/api/v1"
+export MODEL_NAME="meta-llama/llama-3.3-70b-instruct"
+export HF_TOKEN="hf_your_key_here"
+# Run baseline inference
+python inference.py
+```
+### 3) Run Task Grader Sanity Check
+```bash
+cd src
+python tasks.py
+```
+### 4) Train the Q-learning Agent
+```bash
+python src/train.py
+```
+### 5) Run CLI Simulation (rule-based)
+```bash
+python src/main.py
+```
+---
+## Docker
+```bash
+# Build
+docker build -t spectraqual .
+# Run the API server (default — what HF Spaces runs)
+# Exposes: GET / | POST /reset | POST /step | GET /state
+docker run -p 7860:7860 spectraqual
+# → API docs available at http://localhost:7860/docs
+# Run inference inside container
+docker run \
+  -e API_BASE_URL="https://openrouter.ai/api/v1" \
+  -e MODEL_NAME="meta-llama/llama-3.3-70b-instruct" \
+  -e HF_TOKEN="hf_..." \
+  --entrypoint python spectraqual inference.py
+# Run Streamlit dashboard locally (NOT the Docker default — local dev only)
+streamlit run src/app.py --server.port 8501
+```
+---
+## Project Structure
+```
+spectraqual/
+├── inference.py          # Root LLM baseline script (required by OpenEnv)
+├── openenv.yaml          # OpenEnv spec metadata
+├── Dockerfile            # Container definition
+├── requirements.txt      # Pinned dependencies
+├── README.md             # This file
+└── src/
+    ├── config.py         # All constants, task configs, reward weights
+    ├── models.py         # Pydantic typed models (Observation, Action, Reward)
+    ├── env.py            # SpectraQualEnv class (reset/step/state + legacy wrappers)
+    ├── reward.py         # Multi-component normalized reward calculator
+    ├── tasks.py          # 3 tasks + programmatic graders
+    ├── agent.py          # Q-learning agent (baseline model zoo)
+    ├── app.py            # Streamlit dashboard
+    ├── train.py          # Offline Q-table trainer
+    └── main.py           # Rule-based CLI runner
+```
+---
+## Baseline Scores
+| Agent | task_easy | task_medium | task_hard |
+|---|---|---|---|
+| Rule-based | ~0.82 | ~0.61 | ~0.48 |
+| LLM (llama-3.3-70b) | TBD | TBD | TBD |
+| Q-learning (trained) | TBD | TBD | TBD |
+---
+## Research Extensions
+The environment supports:
+- **Anomaly detection mode**: boards with extreme cost+criticality are flagged
+- **Seeded reproducibility**: every task uses a fixed RNG seed
+- **Pluggable agents**: any agent implementing `predict(observation) → action`
+- **Dense reward signal**: sub-rewards for debugging and ablation studies
+- **Explainability**: every step reward comes with a natural-language explanation
+- **Benchmark modes**: noisy observations, partial observability (planned)
+---
+## Environment Variables for Inference
+| Variable | Required | Default | Description |
+|---|---|---|---|
+| `API_BASE_URL` | No | `https://openrouter.ai/api/v1` | LLM API endpoint |
+| `MODEL_NAME` | No | `meta-llama/llama-3.3-70b-instruct` | Model identifier |
+| `HF_TOKEN` | Yes (prod) | — | Hugging Face / API key |
+| `LOCAL_IMAGE_NAME` | No | — | Docker image (for containerized env) |
+---
+## License
+MIT License — see [LICENSE](LICENSE).

inference.py ADDED Viewed

	@@ -0,0 +1,293 @@

+"""
+inference.py — SpectraQual OpenEnv Baseline Inference Script
+Runs an LLM agent against all 3 SpectraQual tasks and emits structured logs.
+Environment variables (set before running):
+    API_BASE_URL   The LLM API endpoint  (default: https://openrouter.ai/api/v1)
+    MODEL_NAME     Model identifier      (default: meta-llama/llama-3.3-70b-instruct)
+    HF_TOKEN       Your Hugging Face / API key (required in production)
+Usage:
+    export HF_TOKEN="hf_xxx..."
+    python inference.py
+Output format:
+    [START] task=<id> env=SpectraQual model=<model>
+    [STEP]  step=<n> action=<A> reward=<r> done=<bool> error=<null|msg>
+    [END]   success=<bool> steps=<n> score=<f> rewards=[...]
+"""
+from __future__ import annotations
+import json
+import os
+import sys
+import time
+from typing import List, Optional
+# ── Path setup so we can import from src/ ──────────────────────────────────
+ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
+SRC_DIR  = os.path.join(ROOT_DIR, "src")
+sys.path.insert(0, SRC_DIR)
+from openai import OpenAI
+from env   import SpectraQualEnv
+from models import PCBAction, StepResult
+from config import (
+    ACTIONS,
+    VALID_ACTIONS,
+    MAX_STEPS_PER_TASK,
+    SUCCESS_SCORE_THRESHOLD,
+    TEMPERATURE,
+    MAX_TOKENS,
+    TASKS,
+)
+from tasks import TASK_DESCRIPTIONS, run_task, grade
+# ── Environment variables ──────────────────────────────────────────────────
+API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
+MODEL_NAME   = os.getenv("MODEL_NAME",   "meta-llama/llama-3.3-70b-instruct")
+HF_TOKEN     = os.getenv("HF_TOKEN")
+API_KEY      = HF_TOKEN or os.getenv("OPENAI_API_KEY", "no-key-set")
+# Optional: if you use from_docker_image() style containerized env
+LOCAL_IMAGE_NAME = os.getenv("LOCAL_IMAGE_NAME")
+BENCHMARK   = "SpectraQual"
+TASK_IDS    = ["task_easy", "task_medium", "task_hard"]
+# ── System prompt for the LLM ──────────────────────────────────────────────
+SYSTEM_PROMPT = """You are a PCB quality-control triage agent.
+You will receive information about a printed circuit board (PCB) including its defect type,
+component cost, criticality score, and available factory soldering slots.
+You must choose exactly ONE action from the allowed list.
+Respond with ONLY the action name — no explanation, no extra text, no punctuation.
+Action meanings:
+- PASS                       → Board has no defect; clear it.
+- SCRAP                      → Board is too damaged or high-risk; discard it.
+- ROUTE_COMPONENT_REPLACEMENT → Board has a missing component; route to repair.
+- ROUTE_SOLDERING             → Board has a solder bridge; send to soldering station.
+- ROUTE_DIAGNOSTICS           → Board has an ambiguous fault; send for investigation.
+- WAIT                        → No soldering slot available; hold the board.
+Rules:
+- For defect_type=none, you MUST respond PASS.
+- For defect_type=missing_component, choose ROUTE_COMPONENT_REPLACEMENT or SCRAP.
+- For defect_type=solder_bridge, choose ROUTE_SOLDERING, WAIT, or SCRAP.
+- For defect_type=short_circuit, choose SCRAP or ROUTE_DIAGNOSTICS.
+- If slots_free=0 and action=ROUTE_SOLDERING would apply, prefer WAIT instead.
+Respond with only one word. Example: ROUTE_SOLDERING"""
+# ── Prompt builder ─────────────────────────────────────────────────────────
+def build_user_prompt(
+    obs,
+    step: int,
+    last_reward: float,
+    history: List[str],
+) -> str:
+    history_txt = "\n".join(history[-5:]) if history else "None"
+    anomaly_txt = f"⚠️ ANOMALY DETECTED (score={obs.anomaly_score:.2f})" if obs.is_anomaly else "Normal"
+    return f"""=== PCB TRIAGE — Step {step} ===
+Board ID:       {obs.board_id}
+Defect Type:    {obs.defect_type}
+Component Cost: ₹{obs.component_cost:.2f}
+Criticality:    {obs.criticality:.2f}
+Slots Free:     {obs.slots_free} / {len(obs.slots_state)}
+Slot State:     {obs.slots_state}
+Anomaly:        {anomaly_txt}
+Valid Actions:  {", ".join(obs.valid_actions)}
+Last Reward:    {last_reward:.4f}
+Cumulative:     {obs.cumulative_reward:.4f}
+Accuracy:       {obs.rolling_accuracy:.2%}
+Recent History:
+{history_txt}
+Choose exactly one action from: {", ".join(obs.valid_actions)}"""
+# ── Structured log helpers ─────────────────────────────────────────────────
+def log_start(task: str, env: str, model: str) -> None:
+    print(
+        f"[START] task={task} env={env} model={model}",
+        flush=True,
+    )
+def log_step(
+    step: int,
+    action: str,
+    reward: float,
+    done: bool,
+    error: Optional[str],
+) -> None:
+    error_val = "null" if error is None else f'"{error}"'
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.4f} done={done} error={error_val}",
+        flush=True,
+    )
+def log_end(
+    success: bool,
+    steps: int,
+    score: float,
+    rewards: List[float],
+) -> None:
+    rewards_str = json.dumps([round(r, 4) for r in rewards])
+    print(
+        f"[END] success={success} steps={steps} score={score:.4f} rewards={rewards_str}",
+        flush=True,
+    )
+# ── LLM call ──────────────────────────────────────────────────────────────
+def get_llm_action(
+    client: OpenAI,
+    obs,
+    step: int,
+    last_reward: float,
+    history: List[str],
+) -> str:
+    """Ask the LLM for a triage action. Falls back to SCRAP on any error."""
+    prompt = build_user_prompt(obs, step, last_reward, history)
+    try:
+        completion = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[
+                {"role": "system", "content": SYSTEM_PROMPT},
+                {"role": "user",   "content": prompt},
+            ],
+            temperature=TEMPERATURE,
+            max_tokens=MAX_TOKENS,
+            stream=False,
+        )
+        raw = (completion.choices[0].message.content or "").strip().upper()
+        # Validate: pick first word that matches a known action
+        for candidate in raw.split():
+            candidate = candidate.strip(".,;:!?\"'")
+            if candidate in ACTIONS:
+                return candidate
+        # Fallback: try to find partial match
+        for action in ACTIONS:
+            if action in raw:
+                return action
+        print(f"[DEBUG] Unexpected model output: {raw!r}", flush=True)
+        return "SCRAP"
+    except Exception as exc:
+        print(f"[DEBUG] LLM request failed: {exc}", flush=True)
+        return "SCRAP"
+# ── Single task runner ─────────────────────────────────────────────────────
+def run_task_inference(client: OpenAI, task_id: str) -> tuple[bool, int, float, List[float]]:
+    """
+    Run the LLM agent against one task.
+    Returns (success, steps_taken, score, rewards_list).
+    """
+    cfg         = TASKS[task_id]
+    max_steps   = min(cfg["n_boards"] + 5, MAX_STEPS_PER_TASK)
+    total_reward_cap = cfg["n_boards"] * 1.0   # max possible (1.0 per step)
+    env          = SpectraQualEnv(task_id=task_id)
+    history:    List[str]  = []
+    rewards:    List[float] = []
+    action_log: List[str]  = []
+    steps_taken  = 0
+    score        = 0.0
+    success      = False
+    log_start(task=task_id, env=BENCHMARK, model=MODEL_NAME)
+    try:
+        result = env.reset()
+        obs         = result.observation
+        last_reward = 0.0
+        for step in range(1, max_steps + 1):
+            if result.done:
+                break
+            # Get action from LLM
+            action_str = get_llm_action(client, obs, step, last_reward, history)
+            action_log.append(action_str)
+            error = None
+            try:
+                result = env.step(PCBAction(action=action_str))
+            except Exception as e:
+                error = str(e)
+                result = env.step(PCBAction(action="SCRAP"))
+            obs         = result.observation
+            reward      = result.reward
+            done        = result.done
+            last_reward = reward
+            rewards.append(reward)
+            steps_taken = step
+            log_step(step=step, action=action_str, reward=reward, done=done, error=error)
+            history.append(
+                f"Step {step}: {action_str!r} → reward={reward:.4f}"
+            )
+            if done:
+                break
+        # Score = average normalized reward across all steps
+        score = sum(rewards) / max(len(rewards), 1)
+        score = min(max(score, 0.0), 1.0)
+        success = score >= SUCCESS_SCORE_THRESHOLD
+    except Exception as exc:
+        print(f"[DEBUG] Task runner error: {exc}", flush=True)
+    finally:
+        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+    return success, steps_taken, score, rewards
+# ── Main ──────────────────────────────────────────────────────────────────
+def main() -> None:
+    print(f"[DEBUG] API_BASE_URL = {API_BASE_URL}", flush=True)
+    print(f"[DEBUG] MODEL_NAME   = {MODEL_NAME}",   flush=True)
+    print(f"[DEBUG] HF_TOKEN     = {'SET' if HF_TOKEN else 'NOT SET (using OPENAI_API_KEY fallback)'}", flush=True)
+    print("", flush=True)
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    all_scores: List[float] = []
+    for task_id in TASK_IDS:
+        print(f"\n{'='*60}", flush=True)
+        print(f"[DEBUG] Starting {task_id} | {TASK_DESCRIPTIONS[task_id][:80]}...", flush=True)
+        print(f"{'='*60}\n", flush=True)
+        success, steps, score, rewards = run_task_inference(client, task_id)
+        all_scores.append(score)
+        print(f"\n[DEBUG] {task_id} complete — score={score:.4f} success={success}\n", flush=True)
+        time.sleep(1)   # brief pause between tasks
+    overall = sum(all_scores) / len(all_scores) if all_scores else 0.0
+    print(f"\n{'='*60}", flush=True)
+    print(f"[SUMMARY] Overall score={overall:.4f}", flush=True)
+    print(f"[SUMMARY] Per-task: { {tid: round(s, 4) for tid, s in zip(TASK_IDS, all_scores)} }", flush=True)
+    print(f"{'='*60}\n", flush=True)
+if __name__ == "__main__":
+    main()

openenv.yaml ADDED Viewed

	@@ -0,0 +1,93 @@

+name: spectraqual
+version: "1.0.0"
+description: >
+  SpectraQual is a smart PCB (Printed Circuit Board) quality-control triage
+  environment. An AI agent processes a stream of boards with randomized defects,
+  choosing the optimal economic and operational action under factory slot
+  constraints. Rewards are decomposed into 5 interpretable components and
+  normalized to [0.0, 1.0] for clean agent training.
+author: "SpectraQual Team"
+tags:
+  - pcb
+  - manufacturing
+  - quality-control
+  - industrial-ai
+  - real-world
+  - openenv
+# Action space
+actions:
+  - PASS
+  - SCRAP
+  - ROUTE_COMPONENT_REPLACEMENT
+  - ROUTE_SOLDERING
+  - ROUTE_DIAGNOSTICS
+  - WAIT
+# Observation fields
+observations:
+  - board_id: "Unique PCB identifier (string)"
+  - defect_type: "none | missing_component | solder_bridge | short_circuit"
+  - component_cost: "Replacement cost in ₹ (10.0–200.0)"
+  - criticality: "Risk score (0.1–1.0)"
+  - slots_free: "Available soldering slots (0–3)"
+  - slots_state: "Time remaining per slot (list of ints)"
+  - is_anomaly: "True if board has extreme cost+criticality"
+  - anomaly_score: "Anomaly confidence (0.0–1.0)"
+  - valid_actions: "Actions permitted for this defect type"
+  - rolling_accuracy: "Fraction of correct decisions so far"
+  - throughput: "Boards processed per step"
+  - cumulative_reward: "Episode cumulative normalized reward"
+# Reward range
+reward:
+  min: 0.0
+  max: 1.0
+  components:
+    - defect_reward: "Correctness of decision for defect type (weight=0.35)"
+    - cost_efficiency: "Economic value retained vs lost (weight=0.25)"
+    - queue_penalty: "Factory bottleneck avoidance (weight=0.20)"
+    - criticality_factor: "Risk-adjusted modifier (weight=0.10)"
+    - anomaly_bonus: "Correct handling of anomalous boards (weight=0.10)"
+# Tasks
+tasks:
+  - id: task_easy
+    description: "Triage 10 boards with no slot pressure. Seed=42."
+    difficulty: easy
+    n_boards: 10
+    seed: 42
+    n_slots: 3
+    anomaly_rate: 0.0
+    expected_score: 0.85
+  - id: task_medium
+    description: "Triage 15 boards with 1 soldering slot (queue pressure). Seed=99."
+    difficulty: medium
+    n_boards: 15
+    seed: 99
+    n_slots: 1
+    anomaly_rate: 0.10
+    expected_score: 0.65
+  - id: task_hard
+    description: "Triage 20 boards with 25% anomaly rate and tight slot constraints. Seed=777."
+    difficulty: hard
+    n_boards: 20
+    seed: 777
+    n_slots: 1
+    anomaly_rate: 0.25
+    expected_score: 0.50
+# Interface compliance
+interface:
+  reset: "Returns initial PCBObservation without reward"
+  step: "Takes PCBAction, returns StepResult (observation, reward, done, info)"
+  state: "Returns full internal environment state as dict"
+# Deployment
+deployment:
+  hf_space: "TAARUNEESHWARAN-027/spectraqual"
+  port: 7860
+  runtime: fastapi
+  api_docs: "/docs"

requirements.txt ADDED Viewed

	@@ -0,0 +1,21 @@

+# SpectraQual dependencies
+# Pinned for reproducibility on vcpu=2, memory=8GB machines
+# Core environment
+pydantic>=2.0.0,<3.0.0
+# Streamlit dashboard
+streamlit>=1.32.0,<2.0.0
+# Plotting
+matplotlib>=3.8.0,<4.0.0
+# LLM inference (OpenAI-compatible client)
+openai>=1.0.0,<2.0.0
+# HTTP client (used by openai SDK)
+httpx>=0.25.0,<1.0.0
+# API Endpoints
+fastapi>=0.100.0,<1.0.0
+uvicorn>=0.23.0,<1.0.0

src/__pycache__/agent.cpython-314.pyc ADDED Viewed

Binary file (2.53 kB). View file

src/__pycache__/app.cpython-314.pyc ADDED Viewed

Binary file (33.5 kB). View file

src/__pycache__/config.cpython-314.pyc ADDED Viewed

Binary file (2.26 kB). View file

src/__pycache__/env.cpython-314.pyc ADDED Viewed

Binary file (16.9 kB). View file

src/__pycache__/environment.cpython-314.pyc ADDED Viewed

Binary file (3.68 kB). View file

src/__pycache__/models.cpython-314.pyc ADDED Viewed

Binary file (6 kB). View file

src/__pycache__/reward.cpython-314.pyc ADDED Viewed

Binary file (12.5 kB). View file

src/__pycache__/tasks.cpython-314.pyc ADDED Viewed

Binary file (9.9 kB). View file

src/agent.py ADDED Viewed

	@@ -0,0 +1,67 @@

+import random
+# Q-table
+Q = {}
+# Actions
+ACTIONS = [
+    "PASS",
+    "SCRAP",
+    "ROUTE_COMPONENT_REPLACEMENT",
+    "ROUTE_SOLDERING",
+    "ROUTE_DIAGNOSTICS",
+    "WAIT"
+]
+def get_valid_actions(defect):
+    if defect == "none":
+        return ["PASS"]
+    if defect == "missing_component":
+        return ["ROUTE_COMPONENT_REPLACEMENT", "SCRAP"]
+    if defect == "solder_bridge":
+        return ["ROUTE_SOLDERING", "WAIT", "SCRAP"]
+    if defect == "short_circuit":
+        return ["SCRAP", "ROUTE_DIAGNOSTICS"]
+    return ["SCRAP"]
+# Convert PCB → STATE
+def get_state(pcb, factory):
+    slots_free = factory["soldering_slots"].count(0)
+    return (
+        pcb["defect_type"],
+        round(pcb["component_cost"] / 50),   # bucket cost
+        round(pcb["criticality"], 1),
+        slots_free
+    )
+# Initialize state
+def init_state(state):
+    if state not in Q:
+        Q[state] = {a: 0 for a in ACTIONS}
+# Epsilon-Greedy policy
+def choose_action(state, epsilon=0.3):
+    init_state(state)
+    defect = state[0]
+    valid_actions = get_valid_actions(defect)
+    # Exploration
+    if random.random() < epsilon:
+        return random.choice(valid_actions)
+    # Exploitation (best action among valid ones)
+    return max(valid_actions, key=lambda a: Q[state][a])
+# Q-learning update
+def update_q(state, action, reward, next_state, alpha=0.1, gamma=0.9):
+    init_state(next_state)
+    old = Q[state][action]
+    future = max(Q[next_state].values())
+    Q[state][action] = old + alpha * (reward + gamma * future - old)

src/api.py ADDED Viewed

	@@ -0,0 +1,58 @@

+from fastapi import FastAPI, HTTPException
+import sys
+import os
+# Add src to path so standard imports work
+sys.path.insert(0, os.path.dirname(__file__))
+from env import SpectraQualEnv
+from models import PCBAction, StepResult
+app = FastAPI(
+    title="SpectraQual OpenEnv API",
+    description="REST API for automated OpenEnv space evaluation",
+    version="1.0.0"
+)
+# Initialize a default environment instance
+# In a real deployed evaluator, they may instantiate isolated environments
+# but for the "ping space URL" test, a global instance is standard.
+env_instance = SpectraQualEnv(task_id="task_easy")
+@app.get("/")
+def health_check():
+    """Returns 200 to pass automated ping test."""
+    return {"status": "ok", "environment": "SpectraQual"}
+@app.post("/reset")
+def reset_env() -> StepResult:
+    """Reset the environment and return initial observation."""
+    try:
+        return env_instance.reset()
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+@app.post("/step")
+def step_env(action: PCBAction) -> StepResult:
+    """Take a step in the environment."""
+    try:
+        if env_instance.state()["done"]:
+            # If done, returning an error or auto-resetting depends on the logic.
+            # Best practice: raise 400 that episode is done.
+            raise HTTPException(status_code=400, detail="Episode is done. Call /reset first.")
+        return env_instance.step(action)
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+@app.get("/state")
+def get_state():
+    """Return the internal state of the environment."""
+    try:
+        return env_instance.state()
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+if __name__ == "__main__":
+    import uvicorn
+    # Typically run via Docker CMD: uvicorn src.api:app --host 0.0.0.0 --port 7860
+    uvicorn.run(app, host="0.0.0.0", port=7860)

src/app.py ADDED Viewed

	@@ -0,0 +1,674 @@

+"""
+app.py — SpectraQual Streamlit Dashboard (v3.0)
+Updated to use the new SpectraQualEnv class with OpenEnv interface.
+Features:
+  - Real-time stacked reward component charts
+  - Per-step accuracy / throughput display
+  - Action confidence from reward components
+  - Anomaly flag indicators
+  - Explainability: "Why this decision?"
+"""
+import sys
+import os
+sys.path.insert(0, os.path.dirname(__file__))
+import streamlit as st
+import matplotlib.pyplot as plt
+import time
+from env    import SpectraQualEnv
+from models import PCBAction
+from config import (
+    COLOR_PRIMARY, COLOR_SUCCESS, COLOR_WARNING,
+    COLOR_DANGER,  COLOR_BG,     COLOR_CARD, COLOR_MUTED,
+    TASKS,
+)
+# ---------------------------
+# PAGE CONFIG
+# ---------------------------
+st.set_page_config(
+    page_title="SpectraQual",
+    page_icon="⚔️",
+    layout="wide",
+    initial_sidebar_state="collapsed",
+)
+# ---------------------------
+# GLOBAL STYLES
+# ---------------------------
+st.markdown("""
+<style>
+@import url('https://fonts.googleapis.com/css2?family=Share+Tech+Mono&family=Rajdhani:wght@500;600;700&family=Exo+2:wght@300;400;600;800&display=swap');
+.stApp {
+    background-color: #080c12;
+    color: #c9d4e0;
+    font-family: 'Exo 2', sans-serif;
+}
+.stApp::before {
+    content: '';
+    position: fixed;
+    inset: 0;
+    background: repeating-linear-gradient(0deg, rgba(0,0,0,0.025) 0px, rgba(0,0,0,0.025) 1px, transparent 1px, transparent 4px);
+    pointer-events: none;
+    z-index: 9999;
+}
+h1 {
+    font-family: 'Rajdhani', sans-serif !important;
+    font-weight: 700 !important;
+    font-size: 2.4rem !important;
+    letter-spacing: 0.12em !important;
+    color: #00e5ff !important;
+    text-shadow: 0 0 18px rgba(0,229,255,0.45), 0 0 40px rgba(0,229,255,0.12);
+    border-bottom: 1px solid rgba(0,229,255,0.15);
+    padding-bottom: 0.4rem;
+}
+h2, h3 {
+    font-family: 'Rajdhani', sans-serif !important;
+    font-weight: 600 !important;
+    font-size: 0.72rem !important;
+    letter-spacing: 0.14em !important;
+    color: #2e6a80 !important;
+    text-transform: uppercase;
+    margin-top: 1.4rem !important;
+    margin-bottom: 0.3rem !important;
+}
+[data-testid="metric-container"] {
+    background: linear-gradient(135deg, #0d1b2a, #09141f);
+    border: 1px solid rgba(0,229,255,0.15);
+    border-radius: 10px;
+    padding: 16px 20px !important;
+    box-shadow: 0 0 22px rgba(0,229,255,0.05), inset 0 1px 0 rgba(255,255,255,0.03);
+    transition: border-color 0.25s;
+}
+[data-testid="metric-container"]:hover { border-color: rgba(0,229,255,0.38); }
+[data-testid="stMetricLabel"] {
+    font-family: 'Share Tech Mono', monospace !important;
+    font-size: 0.68rem !important;
+    color: #2e6a80 !important;
+    letter-spacing: 0.12em;
+    text-transform: uppercase;
+}
+[data-testid="stMetricValue"] {
+    font-family: 'Rajdhani', sans-serif !important;
+    font-size: 2.1rem !important;
+    font-weight: 700 !important;
+    color: #00e5ff !important;
+}
+.stButton > button {
+    background: linear-gradient(135deg, #0d2137, #091824);
+    color: #00e5ff;
+    border: 1px solid rgba(0,229,255,0.3);
+    border-radius: 6px;
+    font-family: 'Rajdhani', sans-serif;
+    font-weight: 600;
+    letter-spacing: 0.1em;
+    font-size: 0.85rem;
+    padding: 9px 18px;
+    text-transform: uppercase;
+    transition: all 0.2s;
+    box-shadow: 0 0 10px rgba(0,229,255,0.06);
+    width: 100%;
+}
+.stButton > button:hover {
+    background: linear-gradient(135deg, #123450, #0d2538);
+    border-color: #00e5ff;
+    box-shadow: 0 0 18px rgba(0,229,255,0.22);
+    transform: translateY(-1px);
+}
+.stButton > button:active { transform: translateY(0); }
+.stSuccess, .stWarning, .stInfo, .stError {
+    border-radius: 8px !important;
+    font-family: 'Rajdhani', sans-serif !important;
+    font-size: 1.1rem !important;
+    font-weight: 600 !important;
+    letter-spacing: 0.05em;
+    border-left-width: 4px !important;
+}
+.stSuccess { background: rgba(0,230,118,0.07)  !important; border-color: #00e676 !important; }
+.stWarning { background: rgba(255,183,0,0.07)   !important; border-color: #ffb700 !important; }
+.stInfo    { background: rgba(0,229,255,0.06)   !important; border-color: #00e5ff !important; }
+.stError   { background: rgba(255,50,50,0.07)   !important; border-color: #ff3232 !important; }
+.pcb-card {
+    background: linear-gradient(135deg, #0d1b2a, #09141f);
+    border: 1px solid rgba(0,229,255,0.15);
+    border-radius: 10px;
+    padding: 18px 22px;
+    font-family: 'Share Tech Mono', monospace;
+    font-size: 0.82rem;
+    line-height: 2.1;
+    box-shadow: inset 0 0 24px rgba(0,0,0,0.25);
+}
+.lbl { color: #2e6a80; font-size: 0.68rem; letter-spacing: 0.12em; text-transform: uppercase; }
+.val { color: #c9f0ff; font-weight: 600; }
+.defect-badge {
+    display: inline-block;
+    padding: 1px 10px;
+    border-radius: 4px;
+    font-size: 0.72rem;
+    font-weight: 700;
+    letter-spacing: 0.08em;
+}
+.b-none    { background: rgba(0,230,118,0.12);  color: #00e676; border: 1px solid #00e676; }
+.b-missing { background: rgba(255,183,0,0.12);  color: #ffb700; border: 1px solid #ffb700; }
+.b-solder  { background: rgba(255,120,0,0.12);  color: #ff7800; border: 1px solid #ff7800; }
+.b-short   { background: rgba(255,50,50,0.12);  color: #ff3232; border: 1px solid #ff3232; }
+.anomaly-badge {
+    display: inline-block;
+    padding: 2px 12px;
+    border-radius: 4px;
+    font-size: 0.72rem;
+    font-weight: 700;
+    background: rgba(255,0,200,0.12);
+    color: #ff00c8;
+    border: 1px solid #ff00c8;
+    letter-spacing: 0.1em;
+    animation: anomalyPulse 1.2s ease-in-out infinite;
+}
+@keyframes anomalyPulse {
+    0%   { box-shadow: 0 0 4px rgba(255,0,200,0.2); }
+    50%  { box-shadow: 0 0 16px rgba(255,0,200,0.6); }
+    100% { box-shadow: 0 0 4px rgba(255,0,200,0.2); }
+}
+.slot-grid { display: flex; flex-wrap: wrap; gap: 8px; margin-top: 4px; }
+.slot-item {
+    display: flex; align-items: center; gap: 8px;
+    background: #0a1420; border-radius: 6px;
+    padding: 7px 13px;
+    font-family: 'Share Tech Mono', monospace;
+    font-size: 0.75rem;
+    border: 1px solid rgba(255,255,255,0.05);
+    min-width: 128px;
+}
+.dot { width:9px; height:9px; border-radius:50%; flex-shrink:0; }
+.dot-free { background:#00e676; box-shadow:0 0 7px #00e676; }
+.dot-busy { background:#ff3232; box-shadow:0 0 7px #ff3232; }
+.dot-lock { background:#3a3a3a; }
+.free { color:#00e676; }
+.busy { color:#ff5a5a; }
+.lock { color:#3a3a3a; }
+.rpill {
+    display: inline-block;
+    padding: 5px 20px;
+    border-radius: 20px;
+    font-family: 'Rajdhani', sans-serif;
+    font-size: 1.3rem;
+    font-weight: 700;
+    letter-spacing: 0.08em;
+}
+.rpos  { background:rgba(0,230,118,0.11); color:#00e676; border:1px solid rgba(0,230,118,0.35); }
+.rneg  { background:rgba(255,50,50,0.11);  color:#ff5a5a; border:1px solid rgba(255,50,50,0.35); }
+.rzero { background:rgba(140,140,140,0.09);color:#888;    border:1px solid rgba(140,140,140,0.25); }
+.score-big {
+    font-family: 'Rajdhani', sans-serif;
+    font-size: 2.4rem;
+    font-weight: 800;
+    letter-spacing: 0.05em;
+    text-shadow: 0 0 14px currentColor;
+}
+hr { border:none; border-top:1px solid rgba(0,229,255,0.08) !important; margin:1.2rem 0 !important; }
+.idle {
+    text-align:center; padding:44px 20px;
+    border:1px dashed rgba(0,229,255,0.15); border-radius:12px;
+    color:#1e4a5a; font-family:'Share Tech Mono',monospace;
+    font-size:0.8rem; letter-spacing:0.12em; margin-top:36px;
+}
+.reward-row {
+    display: flex; align-items: center; gap: 10px;
+    font-family: 'Share Tech Mono', monospace;
+    font-size: 0.74rem;
+    margin-bottom: 6px;
+}
+.reward-label { color: #2e6a80; width: 160px; flex-shrink: 0; }
+.reward-bar-wrap { flex: 1; background: #0a1420; border-radius: 4px; height: 8px; }
+.reward-bar { height: 8px; border-radius: 4px; }
+.reward-val { color: #c9f0ff; width: 48px; text-align: right; }
+[data-testid="stProgressBar"] > div > div {
+    background: linear-gradient(90deg, #0d5e70, #00e5ff) !important;
+    border-radius: 4px;
+}
+[data-testid="stProgressBar"] {
+    background: #0a1420 !important;
+    border: 1px solid rgba(0,229,255,0.12);
+    border-radius: 4px;
+}
+.stCaption {
+    font-family: 'Share Tech Mono', monospace !important;
+    font-size: 0.68rem !important;
+    color: #2e6a80 !important;
+    letter-spacing: 0.1em;
+}
+@keyframes pulseGlow {
+    0%   { box-shadow: 0 0 5px rgba(0,229,255,0.15); }
+    50%  { box-shadow: 0 0 22px rgba(0,229,255,0.45); }
+    100% { box-shadow: 0 0 5px rgba(0,229,255,0.15); }
+}
+.stSuccess, .stWarning, .stError, .stInfo {
+    animation: pulseGlow 1.5s ease-in-out infinite;
+}
+</style>
+""", unsafe_allow_html=True)
+# ---------------------------
+# SESSION STATE
+# ---------------------------
+def _init_state():
+    if "env" not in st.session_state:
+        st.session_state.env = None
+    if "score" not in st.session_state:
+        st.session_state.score = 0.0
+    if "history" not in st.session_state:
+        st.session_state.history = []         # cumulative reward over time
+    if "running" not in st.session_state:
+        st.session_state.running = False
+    if "log" not in st.session_state:
+        st.session_state.log = []             # list of (pcb_obs, action, rc)
+    if "task_id" not in st.session_state:
+        st.session_state.task_id = "task_easy"
+    if "last_result" not in st.session_state:
+        st.session_state.last_result = None
+    if "episode_done" not in st.session_state:
+        st.session_state.episode_done = False
+_init_state()
+# ---------------------------
+# HELPERS
+# ---------------------------
+def defect_badge(d):
+    m = {
+        "none":              ("b-none",    "✓ NONE"),
+        "missing_component": ("b-missing", "⚠ MISSING COMPONENT"),
+        "solder_bridge":     ("b-solder",  "⚡ SOLDER BRIDGE"),
+        "short_circuit":     ("b-short",   "✗ SHORT CIRCUIT"),
+    }
+    cls, label = m.get(d, ("b-none", d.upper()))
+    return f'<span class="defect-badge {cls}">{label}</span>'
+def reward_bar_html(label, score, color="#00e5ff"):
+    pct = int(score * 100)
+    return (
+        f'<div class="reward-row">'
+        f'  <span class="reward-label">{label}</span>'
+        f'  <div class="reward-bar-wrap">'
+        f'    <div class="reward-bar" style="width:{pct}%;background:{color};"></div>'
+        f'  </div>'
+        f'  <span class="reward-val">{score:.2f}</span>'
+        f'</div>'
+    )
+def get_env() -> SpectraQualEnv:
+    if st.session_state.env is None:
+        st.session_state.env = SpectraQualEnv(task_id=st.session_state.task_id)
+    return st.session_state.env
+# ---------------------------
+# HEADER
+# ---------------------------
+st.title("⚔️ SPECTRAQUAL — SMART PCB DECISION SYSTEM")
+st.markdown(
+    '<p style="font-family:\'Share Tech Mono\',monospace;font-size:0.72rem;'
+    'color:#1e4a5a;letter-spacing:0.16em;margin-top:-10px;margin-bottom:4px;">'
+    'REAL-TIME QUALITY INTELLIGENCE ENGINE · v3.0 · OpenEnv Compliant</p>',
+    unsafe_allow_html=True,
+)
+# ---------------------------
+# SIDEBAR TASK SELECTOR
+# ---------------------------
+with st.sidebar:
+    st.markdown("### 🎯 Task Selection")
+    task_choice = st.selectbox(
+        "Select Task",
+        options=list(TASKS.keys()),
+        format_func=lambda t: f"{t} ({TASKS[t]['difficulty'].upper()})",
+        index=list(TASKS.keys()).index(st.session_state.task_id),
+    )
+    if task_choice != st.session_state.task_id:
+        st.session_state.task_id   = task_choice
+        st.session_state.env       = None
+        st.session_state.score     = 0.0
+        st.session_state.history   = []
+        st.session_state.log       = []
+        st.session_state.last_result = None
+        st.session_state.episode_done = False
+    cfg = TASKS[st.session_state.task_id]
+    st.markdown(f"""
+    **Boards:** {cfg['n_boards']}
+    **Slots:** {cfg['n_slots']}
+    **Seed:** {cfg['seed']}
+    **Anomaly Rate:** {cfg['anomaly_rate']:.0%}
+    **Difficulty:** {cfg['difficulty'].upper()}
+    """)
+    st.markdown("---")
+    speed = st.slider("⚡ Speed (s/step)", 0.2, 2.0, 0.8, step=0.1)
+# ---------------------------
+# SPEED (fallback if sidebar collapsed)
+# ---------------------------
+if "speed" not in dir():
+    speed = 0.8
+st.markdown("<hr>", unsafe_allow_html=True)
+# ---------------------------
+# METRICS BAR
+# ---------------------------
+env_obj = get_env()
+state   = env_obj.state()
+m1, m2, m3, m4, m5 = st.columns(5)
+m1.metric("💰 Cumul. Reward",  f"{state['cumulative_reward']:.3f}")
+m2.metric("🎯 Accuracy",       f"{state['rolling_accuracy']:.1%}")
+m3.metric("⚙️ Active Slots",   sum(1 for s in state['slots'] if 0 < s < 9999))
+m4.metric("🧠 Decisions",      state['total_count'])
+m5.metric("⚠️ Bottlenecks",    state['bottleneck_count'])
+last_r = round(st.session_state.log[-1][2].normalized, 3) if st.session_state.log else "N/A"
+status_color = "#00e5ff" if st.session_state.log else "#1e4a5a"
+st.markdown(f"""
+<div style="font-family:'Share Tech Mono',monospace;font-size:0.75rem;
+    color:{status_color};padding:6px 14px;border:1px solid rgba(0,229,255,0.2);
+    border-radius:6px;display:inline-block;margin-top:10px;margin-bottom:4px;
+    background:rgba(0,229,255,0.03);letter-spacing:0.1em;">
+🟢 TASK: {st.session_state.task_id.upper()} &nbsp;·&nbsp; LAST REWARD: {last_r} &nbsp;·&nbsp; STEPS: {state['step']}
+</div>
+""", unsafe_allow_html=True)
+st.markdown("<hr>", unsafe_allow_html=True)
+# ---------------------------
+# CONTROL BUTTONS
+# ---------------------------
+c1, c2, c3, c4, c5 = st.columns(5)
+with c1:
+    if st.button("▶  RUN STEP"):
+        st.session_state.running  = False
+        st.session_state.run_once = True
+with c2:
+    if st.button("⚡  AUTO RUN"):
+        st.session_state.running = True
+with c3:
+    if st.button("⛔  STOP"):
+        st.session_state.running = False
+with c4:
+    if st.button("🔄  RESET"):
+        env_obj.reset()
+        st.session_state.score    = 0.0
+        st.session_state.history  = []
+        st.session_state.log      = []
+        st.session_state.last_result = None
+        st.session_state.episode_done = False
+with c5:
+    if st.button("🆕  NEW TASK"):
+        st.session_state.env = None
+        st.session_state.score   = 0.0
+        st.session_state.history = []
+        st.session_state.log     = []
+        st.session_state.last_result = None
+        st.session_state.episode_done = False
+# ---------------------------
+# CORE STEP
+# ---------------------------
+def run_step():
+    env  = get_env()
+    # Initialize if needed
+    if env._done or env._current_pcb is None:
+        result = env.reset()
+        if result.done:
+            st.session_state.episode_done = True
+            return None
+    # Get current obs to determine action
+    obs    = env._build_observation(*__import__("reward").detect_anomaly(env._current_pcb))
+    # Use rule-based decision (greedy heuristic)
+    from env import decide_action
+    pcb_dict = {
+        "defect_type":    obs.defect_type,
+        "component_cost": obs.component_cost,
+        "criticality":    obs.criticality,
+    }
+    action_str = decide_action(pcb_dict)
+    result = env.step(PCBAction(action=action_str))
+    rc     = result.reward_components
+    st.session_state.score     = env.state()["cumulative_reward"]
+    st.session_state.history.append(st.session_state.score)
+    st.session_state.log.append((result.observation, action_str, rc))
+    st.session_state.last_result = result
+    if result.done:
+        st.session_state.episode_done = True
+    return result
+# ---------------------------
+# DISPLAY
+# ---------------------------
+def display(result):
+    from collections import Counter
+    obs = result.observation
+    rc  = result.reward_components
+    col1, col2 = st.columns(2, gap="large")
+    # ── LEFT ──
+    with col1:
+        st.subheader("PCB Info")
+        anomaly_html = ""
+        if obs.is_anomaly:
+            anomaly_html = f'<span class="anomaly-badge">⚠️ ANOMALY {obs.anomaly_score:.2f}</span>'
+        st.markdown(f"""
+        <div class="pcb-card">
+            <div><span class="lbl">Board ID &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span>
+                 <span class="val">{obs.board_id}</span></div>
+            <div><span class="lbl">Defect Type &nbsp;&nbsp;</span>
+                 {defect_badge(obs.defect_type)}</div>
+            <div><span class="lbl">Component Cost </span>
+                 <span class="val">₹{obs.component_cost:.2f}</span></div>
+            <div><span class="lbl">Criticality &nbsp;&nbsp;&nbsp;</span>
+                 <span class="val">{obs.criticality:.2f}</span></div>
+            <div><span class="lbl">Anomaly &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span>
+                 {anomaly_html if anomaly_html else '<span class="val" style="color:#2e6a80;">Normal</span>'}</div>
+        </div>
+        """, unsafe_allow_html=True)
+        st.subheader("Decision")
+        action = st.session_state.log[-1][1] if st.session_state.log else "N/A"
+        if action == "PASS":
+            st.success(f"✅  {action}")
+        elif "ROUTE" in action:
+            st.warning(f"🛠️  {action}")
+        elif action == "WAIT":
+            st.warning("⏳  WAITING FOR SLOT AVAILABILITY")
+        else:
+            st.error(f"❌  {action}")
+        if rc:
+            st.subheader("🧠 Why this decision?")
+            explanation_parts = rc.explanation.split(" | ")
+            for part in explanation_parts[:3]:
+                st.info(part)
+        st.subheader("Step Reward")
+        r = result.reward
+        if r >= 0.6:
+            st.markdown(f'<span class="rpill rpos">▲ {r:.4f}</span>', unsafe_allow_html=True)
+        elif r >= 0.35:
+            st.markdown(f'<span class="rpill rzero">● {r:.4f}</span>', unsafe_allow_html=True)
+        else:
+            st.markdown(f'<span class="rpill rneg">▼ {r:.4f}</span>', unsafe_allow_html=True)
+        if rc:
+            st.subheader("📊 Reward Component Breakdown")
+            components = [
+                ("Defect Handling",  rc.defect_reward,       "#00e5ff"),
+                ("Cost Efficiency",  rc.cost_efficiency,     "#00e676"),
+                ("Queue Mgmt",       rc.queue_penalty,       "#ffb700"),
+                ("Risk Factor",      rc.criticality_factor,  "#ff7800"),
+                ("Anomaly Bonus",    rc.anomaly_bonus,       "#ff00c8"),
+            ]
+            bars_html = ""
+            for label, val, color in components:
+                bars_html += reward_bar_html(label, val, color)
+            st.markdown(bars_html, unsafe_allow_html=True)
+        st.subheader("Rolling Metrics")
+        sub1, sub2 = st.columns(2)
+        with sub1:
+            st.metric("🎯 Accuracy", f"{obs.rolling_accuracy:.1%}")
+        with sub2:
+            st.metric("⚡ Throughput", f"{obs.throughput:.2f}")
+    # ── RIGHT ──
+    with col2:
+        st.subheader("Factory Slots")
+        slot_html = '<div class="slot-grid">'
+        for i, slot in enumerate(obs.slots_state):
+            if slot == -1:
+                slot_html += (f'<div class="slot-item"><div class="dot dot-lock"></div>'
+                              f'<span class="lock">SLOT {i:02d} · LOCKED</span></div>')
+            elif slot > 0:
+                slot_html += (f'<div class="slot-item"><div class="dot dot-busy"></div>'
+                              f'<span class="busy">SLOT {i:02d} · {slot}t</span></div>')
+            else:
+                slot_html += (f'<div class="slot-item"><div class="dot dot-free"></div>'
+                              f'<span class="free">SLOT {i:02d} · FREE</span></div>')
+        slot_html += '</div>'
+        st.markdown(slot_html, unsafe_allow_html=True)
+        st.subheader("Cumulative Reward")
+        score_color = "#00e676" if st.session_state.score >= 0.5 else "#ff5a5a"
+        st.markdown(
+            f'<div class="score-big" style="color:{score_color}">'
+            f'{st.session_state.score:.4f}</div>',
+            unsafe_allow_html=True,
+        )
+        st.subheader("📈 Reward Trend")
+        fig, ax = plt.subplots(figsize=(5.5, 3))
+        fig.patch.set_facecolor("#080c12")
+        ax.set_facecolor("#0a1420")
+        history = st.session_state.history
+        if history:
+            ax.plot(history, color="#00e5ff", linewidth=1.8,
+                    marker='o', markersize=3.5,
+                    markerfacecolor="#00e5ff", markeredgewidth=0)
+            ax.fill_between(range(len(history)), history, alpha=0.10, color="#00e5ff")
+            ax.axhline(y=0.6, color="#00e676", linewidth=0.8, linestyle="--", alpha=0.5, label="Success threshold")
+        ax.set_title("Cumulative Reward", color="#2e6a80", fontsize=9, pad=8)
+        ax.set_xlabel("Steps",  color="#2e6a80", fontsize=8)
+        ax.set_ylabel("Score",  color="#2e6a80", fontsize=8)
+        ax.set_ylim(0, max(max(history, default=1.0) * 1.1, 1.0))
+        ax.tick_params(colors="#2e6a80", labelsize=7)
+        ax.grid(color="#0d2535", linewidth=0.7, linestyle="--")
+        for spine in ax.spines.values():
+            spine.set_edgecolor("#0d2535")
+        fig.tight_layout(pad=1.2)
+        st.pyplot(fig)
+        plt.close(fig)
+        # Stacked Reward Components Over Time
+        if len(st.session_state.log) >= 2:
+            st.subheader("📊 Component Breakdown Over Time")
+            steps_data = st.session_state.log[-20:]  # last 20 steps
+            comp_labels = ["Defect", "Cost", "Queue", "Risk", "Anomaly"]
+            comp_colors = ["#00e5ff", "#00e676", "#ffb700", "#ff7800", "#ff00c8"]
+            comp_data   = {l: [] for l in comp_labels}
+            for _, _, rc_entry in steps_data:
+                if rc_entry:
+                    comp_data["Defect"].append(rc_entry.defect_reward)
+                    comp_data["Cost"].append(rc_entry.cost_efficiency)
+                    comp_data["Queue"].append(rc_entry.queue_penalty)
+                    comp_data["Risk"].append(rc_entry.criticality_factor)
+                    comp_data["Anomaly"].append(rc_entry.anomaly_bonus)
+            if any(comp_data.values()):
+                fig2, ax2 = plt.subplots(figsize=(5.5, 2.8))
+                fig2.patch.set_facecolor("#080c12")
+                ax2.set_facecolor("#0a1420")
+                x = list(range(len(next(iter(comp_data.values())))))
+                bottom = [0.0] * len(x)
+                for label, color in zip(comp_labels, comp_colors):
+                    vals = comp_data[label]
+                    if vals and len(vals) == len(x):
+                        # Normalize each component's contribution by weight
+                        ax2.fill_between(x, bottom,
+                                         [b + v * 0.2 for b, v in zip(bottom, vals)],
+                                         alpha=0.6, color=color, label=label)
+                        bottom = [b + v * 0.2 for b, v in zip(bottom, vals)]
+                ax2.set_title("Reward Components (last 20 steps)", color="#2e6a80", fontsize=8, pad=6)
+                ax2.set_xlabel("Steps", color="#2e6a80", fontsize=7)
+                ax2.tick_params(colors="#2e6a80", labelsize=6)
+                ax2.grid(color="#0d2535", linewidth=0.5, linestyle="--")
+                for spine in ax2.spines.values():
+                    spine.set_edgecolor("#0d2535")
+                ax2.legend(loc="upper right", fontsize=6,
+                           facecolor="#080c12", edgecolor="#2e6a80", labelcolor="#c9d4e0")
+                fig2.tight_layout(pad=1.0)
+                st.pyplot(fig2)
+                plt.close(fig2)
+        # Decision Distribution
+        if st.session_state.log:
+            st.subheader("📊 Decision Distribution")
+            decisions = [entry[1] for entry in st.session_state.log]
+            from collections import Counter
+            counts = dict(Counter(decisions))
+            st.bar_chart(counts)
+    # Episode Done banner
+    if st.session_state.episode_done:
+        final = st.session_state.score
+        if final >= 0.6:
+            st.success(f"🏆 EPISODE COMPLETE — Score: {final:.4f} — SUCCESS!")
+        else:
+            st.warning(f"⚠️ EPISODE COMPLETE — Score: {final:.4f} — Below success threshold (0.60)")
+# ---------------------------
+# EXECUTION
+# ---------------------------
+if "run_once" in st.session_state and st.session_state.run_once:
+    result = run_step()
+    if result:
+        display(result)
+    st.session_state.run_once = False
+elif st.session_state.running:
+    placeholder = st.empty()
+    for _ in range(1000):
+        if not st.session_state.running:
+            break
+        if st.session_state.episode_done:
+            st.session_state.running = False
+            break
+        result = run_step()
+        if result:
+            with placeholder.container():
+                display(result)
+        time.sleep(speed)
+elif st.session_state.last_result:
+    display(st.session_state.last_result)
+else:
+    st.markdown("""
+    <div class="idle">
+        [ SYSTEM IDLE ]<br><br>
+        SELECT A TASK IN THE SIDEBAR &nbsp; · &nbsp; PRESS &nbsp; ▶ RUN STEP &nbsp; OR &nbsp; ⚡ AUTO RUN &nbsp; TO BEGIN
+    </div>
+    """, unsafe_allow_html=True)

src/config.py ADDED Viewed

	@@ -0,0 +1,116 @@

+"""
+config.py — SpectraQual Centralized Configuration
+All constants, reward weights, task definitions, and environment settings live here.
+"""
+# ---------------------------
+# DEFECT TYPES
+# ---------------------------
+DEFECT_TYPES = ["none", "missing_component", "solder_bridge", "short_circuit"]
+# ---------------------------
+# ACTION SPACE
+# ---------------------------
+ACTIONS = [
+    "PASS",
+    "SCRAP",
+    "ROUTE_COMPONENT_REPLACEMENT",
+    "ROUTE_SOLDERING",
+    "ROUTE_DIAGNOSTICS",
+    "WAIT",
+]
+# Valid actions per defect type
+VALID_ACTIONS = {
+    "none":              ["PASS"],
+    "missing_component": ["ROUTE_COMPONENT_REPLACEMENT", "SCRAP"],
+    "solder_bridge":     ["ROUTE_SOLDERING", "WAIT", "SCRAP"],
+    "short_circuit":     ["SCRAP", "ROUTE_DIAGNOSTICS"],
+}
+# ---------------------------
+# FACTORY SETTINGS
+# ---------------------------
+N_SOLDERING_SLOTS = 3          # Number of parallel soldering slots
+SOLDERING_JOB_DURATION = 2     # Time units a soldering job occupies a slot
+# ---------------------------
+# PCB GENERATION BOUNDS
+# ---------------------------
+COMPONENT_COST_MIN = 10.0
+COMPONENT_COST_MAX = 200.0
+CRITICALITY_MIN    = 0.1
+CRITICALITY_MAX    = 1.0
+# Anomaly: board_id prefix for rare-defect boards
+ANOMALY_COST_THRESHOLD       = 180.0   # cost > this → anomaly candidate
+ANOMALY_CRITICALITY_THRESHOLD = 0.92   # criticality > this → anomaly candidate
+# ---------------------------
+# REWARD WEIGHTS (multi-component)
+# ---------------------------
+REWARD_WEIGHT_DEFECT      = 0.35
+REWARD_WEIGHT_COST        = 0.25
+REWARD_WEIGHT_QUEUE       = 0.20
+REWARD_WEIGHT_CRITICALITY = 0.10
+REWARD_WEIGHT_ANOMALY     = 0.10
+# Raw reward scaling reference (used for normalization)
+RAW_REWARD_MIN = -60.0
+RAW_REWARD_MAX = 160.0
+# ---------------------------
+# TASK DEFINITIONS
+# ---------------------------
+TASKS = {
+    "task_easy": {
+        "id":          "task_easy",
+        "description": "Triage 10 boards with no slot pressure. Focus: correct defect classification.",
+        "difficulty":  "easy",
+        "n_boards":    10,
+        "seed":        42,
+        "n_slots":     3,       # all slots always available
+        "anomaly_rate": 0.0,
+    },
+    "task_medium": {
+        "id":          "task_medium",
+        "description": "Triage 15 boards with one soldering slot. Manage queue pressure.",
+        "difficulty":  "medium",
+        "n_boards":    15,
+        "seed":        99,
+        "n_slots":     1,       # only 1 slot → queue pressure
+        "anomaly_rate": 0.1,
+    },
+    "task_hard": {
+        "id":          "task_hard",
+        "description": "Triage 20 boards with mixed anomalies and tight slot constraints.",
+        "difficulty":  "hard",
+        "n_boards":    20,
+        "seed":        777,
+        "n_slots":     1,
+        "anomaly_rate": 0.25,
+    },
+}
+# Grader thresholds
+MEDIUM_ECONOMIC_TARGET   = 0.50   # 50% of max possible economic reward
+HARD_ANOMALY_RATE_TARGET = 0.50   # must flag ≥50% of actual anomalies
+# ---------------------------
+# INFERENCE SCRIPT SETTINGS
+# ---------------------------
+MAX_STEPS_PER_TASK    = 25        # safety cap (must fit in 20-min runtime)
+SUCCESS_SCORE_THRESHOLD = 0.60    # ≥0.60 normalized score = success
+TEMPERATURE           = 0.2
+MAX_TOKENS            = 64        # actions are short, no need for long outputs
+# ---------------------------
+# LOGGING COLOR REFERENCE (for app.py)
+# ---------------------------
+COLOR_PRIMARY  = "#00e5ff"
+COLOR_SUCCESS  = "#00e676"
+COLOR_WARNING  = "#ffb700"
+COLOR_DANGER   = "#ff3232"
+COLOR_BG       = "#080c12"
+COLOR_CARD     = "#0d1b2a"
+COLOR_MUTED    = "#2e6a80"

src/env.py ADDED Viewed

	@@ -0,0 +1,358 @@

+"""
+env.py — SpectraQual OpenEnv-Compliant Environment
+Implements the full OpenEnv interface: reset() / step() / state()
+with seeding, anomaly detection, episode management, and rolling metrics.
+"""
+from __future__ import annotations
+import random
+import sys
+import os
+from typing import Dict, Any, Optional, List
+# Allow running from src/ directory directly
+sys.path.insert(0, os.path.dirname(__file__))
+from config import (
+    DEFECT_TYPES,
+    VALID_ACTIONS,
+    N_SOLDERING_SLOTS,
+    SOLDERING_JOB_DURATION,
+    COMPONENT_COST_MIN,
+    COMPONENT_COST_MAX,
+    CRITICALITY_MIN,
+    CRITICALITY_MAX,
+    TASKS,
+)
+from models import PCBObservation, PCBAction, StepResult, RewardComponents
+from reward import calculate_reward, detect_anomaly
+# ---------------------------
+# SPECTRAQUAL ENVIRONMENT
+# ---------------------------
+class SpectraQualEnv:
+    """
+    PCB Smart Quality-Control Triage Environment.
+    An AI agent processes a stream of printed circuit boards, each with a
+    randomly (but reproducibly seeded) assigned defect. The agent must choose
+    the optimal triage action given economic constraints and factory slot availability.
+    Implements the OpenEnv interface:
+        reset()  → StepResult (initial observation)
+        step()   → StepResult
+        state()  → dict (full internal state)
+    """
+    def __init__(self, task_id: str = "task_easy", seed: Optional[int] = None):
+        if task_id not in TASKS:
+            raise ValueError(f"Unknown task_id '{task_id}'. Valid: {list(TASKS.keys())}")
+        self.task_cfg   = TASKS[task_id]
+        self.task_id    = task_id
+        self.seed       = seed if seed is not None else self.task_cfg["seed"]
+        self._rng       = random.Random(self.seed)
+        # Runtime state (initialized on reset)
+        self._slots:          List[int]          = []
+        self._step_num:       int                = 0
+        self._done:           bool               = True
+        self._current_pcb:    Optional[Dict]     = None
+        self._correct_count:  int                = 0
+        self._total_count:    int                = 0
+        self._bottleneck_cnt: int                = 0
+        self._anomaly_total:  int                = 0
+        self._anomaly_flagged:int                = 0
+        self._cumulative_reward: float           = 0.0
+        self._reward_history: List[float]        = []
+        self._all_rewards:    List[float]        = []
+    # ------------------------------------------------
+    # INTERNAL HELPERS
+    # ------------------------------------------------
+    def _reset_slots(self) -> None:
+        n = self.task_cfg["n_slots"]
+        # Fill remaining slots with 0 (free) up to N_SOLDERING_SLOTS
+        self._slots = [0] * N_SOLDERING_SLOTS
+        # Mark slots beyond the task limit as permanently busy (simulates fewer slots)
+        for i in range(n, N_SOLDERING_SLOTS):
+            self._slots[i] = 9999  # permanently locked
+    def _get_slot_view(self) -> List[int]:
+        """Public view: replace 9999 sentinel with -1 for clarity."""
+        return [s if s != 9999 else -1 for s in self._slots]
+    def _count_free_slots(self) -> int:
+        return sum(1 for s in self._slots if s == 0)
+    def _tick_slots(self) -> None:
+        """Advance factory time: reduce non-locked slot timers by 1."""
+        for i in range(len(self._slots)):
+            if 0 < self._slots[i] < 9999:
+                self._slots[i] -= 1
+    def _assign_slot(self) -> bool:
+        """Try to assign a soldering job. Returns True if successful."""
+        for i in range(len(self._slots)):
+            if self._slots[i] == 0:
+                self._slots[i] = SOLDERING_JOB_DURATION
+                return True
+        return False
+    def _generate_pcb(self) -> Dict[str, Any]:
+        """Generate a random PCB using internal seeded RNG."""
+        # Inject anomaly based on task config
+        anomaly_roll = self._rng.random()
+        anomaly_rate = self.task_cfg.get("anomaly_rate", 0.0)
+        if anomaly_rate > 0 and anomaly_roll < anomaly_rate:
+            # Force extreme values
+            cost        = round(self._rng.uniform(185.0, 200.0), 2)
+            criticality = round(self._rng.uniform(0.93, 1.0), 2)
+            defect      = self._rng.choice(["missing_component", "short_circuit"])
+        else:
+            defect      = self._rng.choice(DEFECT_TYPES)
+            cost        = round(self._rng.uniform(COMPONENT_COST_MIN, COMPONENT_COST_MAX), 2)
+            criticality = round(self._rng.uniform(CRITICALITY_MIN, CRITICALITY_MAX), 2)
+        board_id = f"SQ-{self._rng.randint(1000, 9999)}"
+        return {
+            "board_id":       board_id,
+            "defect_type":    defect,
+            "component_cost": cost,
+            "criticality":    criticality,
+        }
+    def _is_correct(self, defect: str, action: str) -> bool:
+        """Check if action is the single best action for this defect."""
+        best = {
+            "none":              "PASS",
+            "missing_component": "ROUTE_COMPONENT_REPLACEMENT",
+            "solder_bridge":     "ROUTE_SOLDERING",
+            "short_circuit":     "SCRAP",
+        }
+        return best.get(defect) == action
+    def _build_observation(self, is_anomaly: bool, anomaly_score: float) -> PCBObservation:
+        pcb         = self._current_pcb
+        defect      = pcb["defect_type"]
+        free_slots  = self._count_free_slots()
+        slot_view   = self._get_slot_view()
+        total       = self._total_count or 1
+        return PCBObservation(
+            board_id=pcb["board_id"],
+            defect_type=defect,
+            component_cost=pcb["component_cost"],
+            criticality=pcb["criticality"],
+            slots_free=free_slots,
+            slots_state=slot_view,
+            is_anomaly=is_anomaly,
+            anomaly_score=round(anomaly_score, 4),
+            step=self._step_num,
+            task_id=self.task_id,
+            valid_actions=VALID_ACTIONS.get(defect, ["SCRAP"]),
+            rolling_accuracy=round(self._correct_count / total, 4),
+            throughput=round(self._total_count / max(self._step_num, 1), 4),
+            cumulative_reward=round(self._cumulative_reward, 4),
+        )
+    # ------------------------------------------------
+    # PUBLIC OPENENV INTERFACE
+    # ------------------------------------------------
+    def reset(self) -> StepResult:
+        """
+        Reset the environment to a clean initial state.
+        Returns the first observation without a reward.
+        """
+        self._rng             = random.Random(self.seed)
+        self._step_num        = 0
+        self._done            = False
+        self._correct_count   = 0
+        self._total_count     = 0
+        self._bottleneck_cnt  = 0
+        self._anomaly_total   = 0
+        self._anomaly_flagged = 0
+        self._cumulative_reward = 0.0
+        self._reward_history  = []
+        self._all_rewards     = []
+        self._reset_slots()
+        self._current_pcb = self._generate_pcb()
+        is_anomaly, anomaly_score = detect_anomaly(self._current_pcb)
+        if is_anomaly:
+            self._anomaly_total += 1
+        obs = self._build_observation(is_anomaly, anomaly_score)
+        return StepResult(
+            observation=obs,
+            reward=0.0,
+            reward_components=None,
+            done=False,
+            info={"message": "Environment reset. Episode started.", "seed": self.seed},
+        )
+    def step(self, action: PCBAction) -> StepResult:
+        """
+        Apply an action to the current board.
+        Advances factory state, computes reward, generates next PCB.
+        """
+        if self._done:
+            raise RuntimeError("Episode is done. Call reset() before stepping.")
+        self._step_num  += 1
+        self._total_count += 1
+        action_str = action.action
+        pcb        = self._current_pcb
+        defect     = pcb["defect_type"]
+        # Check if action is valid (penalize but don't crash)
+        valid = VALID_ACTIONS.get(defect, ["SCRAP"])
+        if action_str not in valid:
+            # Remap invalid action to SCRAP (safe fallback)
+            action_str = "SCRAP"
+        # Factory tick
+        self._tick_slots()
+        # Handle soldering slot assignment
+        if action_str == "ROUTE_SOLDERING":
+            assigned = self._assign_slot()
+            if not assigned:
+                self._bottleneck_cnt += 1
+        # Anomaly detection
+        is_anomaly, anomaly_score = detect_anomaly(pcb)
+        if is_anomaly:
+            self._anomaly_total += 1
+            # Track if agent "handled" anomaly correctly (chose optimal action)
+            if self._is_correct(defect, action_str):
+                self._anomaly_flagged += 1
+        # Reward
+        rc = calculate_reward(
+            pcb=pcb,
+            action=action_str,
+            slots_state=self._slots,
+            is_anomaly=is_anomaly,
+        )
+        reward = rc.normalized
+        self._cumulative_reward += reward
+        self._all_rewards.append(reward)
+        self._reward_history.append(reward)
+        # Accuracy tracking
+        if self._is_correct(defect, action_str):
+            self._correct_count += 1
+        # Episode done?
+        max_boards = self.task_cfg["n_boards"]
+        done = (self._total_count >= max_boards)
+        self._done = done
+        # Prepare next PCB (for observation even if done)
+        if not done:
+            self._current_pcb = self._generate_pcb()
+            next_is_anomaly, next_anomaly_score = detect_anomaly(self._current_pcb)
+        else:
+            # Episode over — reuse last PCB for observation
+            next_is_anomaly, next_anomaly_score = is_anomaly, anomaly_score
+        obs = self._build_observation(next_is_anomaly, next_anomaly_score)
+        return StepResult(
+            observation=obs,
+            reward=reward,
+            reward_components=rc,
+            done=done,
+            info={
+                "action_taken":     action_str,
+                "defect":           defect,
+                "board_id":         pcb["board_id"],
+                "is_anomaly":       is_anomaly,
+                "anomaly_score":    round(anomaly_score, 4),
+                "bottleneck_count": self._bottleneck_cnt,
+                "step":             self._step_num,
+                "correct_count":    self._correct_count,
+                "total_count":      self._total_count,
+            },
+        )
+    def state(self) -> Dict[str, Any]:
+        """Return the full internal environment state as a dict."""
+        return {
+            "task_id":           self.task_id,
+            "seed":              self.seed,
+            "step":              self._step_num,
+            "done":              self._done,
+            "slots":             self._get_slot_view(),
+            "free_slots":        self._count_free_slots(),
+            "current_pcb":       self._current_pcb,
+            "correct_count":     self._correct_count,
+            "total_count":       self._total_count,
+            "bottleneck_count":  self._bottleneck_cnt,
+            "anomaly_total":     self._anomaly_total,
+            "anomaly_flagged":   self._anomaly_flagged,
+            "cumulative_reward": round(self._cumulative_reward, 4),
+            "reward_history":    self._all_rewards,
+            "rolling_accuracy":  round(self._correct_count / max(self._total_count, 1), 4),
+            "throughput":        round(self._total_count / max(self._step_num, 1), 4),
+        }
+# ---------------------------
+# LEGACY COMPAT (for main.py / train.py / app.py)
+# ---------------------------
+# The old code imported module-level factory dict + generate_pcb / decide_action etc.
+# We keep those here as thin wrappers so existing imports don't break.
+_default_env = SpectraQualEnv("task_easy")
+factory = {"soldering_slots": _default_env._slots}
+def generate_pcb():
+    return _default_env._generate_pcb()
+def update_factory():
+    _default_env._tick_slots()
+    factory["soldering_slots"] = _default_env._get_slot_view()
+def assign_soldering_job():
+    return _default_env._assign_slot()
+def decide_action(pcb):
+    """Legacy rule-based decision (used by main.py)."""
+    from config import VALID_ACTIONS
+    defect = pcb["defect_type"]
+    cost   = pcb["component_cost"]
+    critical = pcb["criticality"]
+    if defect == "none":
+        return "PASS"
+    if defect == "missing_component":
+        return "ROUTE_COMPONENT_REPLACEMENT" if cost > 50 else "SCRAP"
+    if defect == "solder_bridge":
+        return "ROUTE_SOLDERING" if _default_env._count_free_slots() > 0 else "WAIT"
+    if defect == "short_circuit":
+        return "SCRAP" if critical > 0.7 else "ROUTE_DIAGNOSTICS"
+    return "SCRAP"
+def calculate_reward_legacy(pcb, decision):
+    """Legacy single-float reward (used by train.py)."""
+    rc = calculate_reward(
+        pcb=pcb,
+        action=decision,
+        slots_state=_default_env._slots,
+        is_anomaly=False,
+    )
+    # Scale normalized [0,1] back to a range train.py expects
+    return (rc.normalized - 0.5) * 200

src/main.py ADDED Viewed

	@@ -0,0 +1,28 @@

+from env import generate_pcb, decide_action, calculate_reward
+from env import update_factory, factory
+TOTAL_BOARDS = 10
+total_score = 0
+# Reset factory
+factory["soldering_slots"] = [0, 0, 0]
+for i in range(TOTAL_BOARDS):
+    print(f"\n--- TIME STEP {i+1} ---")
+    #Update factory (time passes)
+    update_factory()
+    pcb = generate_pcb()
+    decision = decide_action(pcb)
+    reward = calculate_reward(pcb, decision)
+    total_score += reward
+    print(f"PCB: {pcb}")
+    print(f"Decision: {decision}")
+    print(f"Reward: {round(reward,2)}")
+    print(f"Factory Slots: {factory['soldering_slots']}")
+print("\n⚔️ Total Economic Score:", round(total_score,2))

src/models.py ADDED Viewed

	@@ -0,0 +1,140 @@

+"""
+models.py — SpectraQual Typed Pydantic Models
+OpenEnv spec requires: typed Observation, Action, Reward models.
+"""
+from __future__ import annotations
+from typing import List, Literal, Optional, Dict, Any
+from pydantic import BaseModel, Field
+# ---------------------------
+# PCB OBSERVATION
+# ---------------------------
+class PCBObservation(BaseModel):
+    """Observation returned after each reset() or step()."""
+    board_id: str = Field(..., description="Unique board identifier, e.g. SQ-4321")
+    defect_type: Literal[
+        "none", "missing_component", "solder_bridge", "short_circuit"
+    ] = Field(..., description="Type of defect detected on the PCB")
+    component_cost: float = Field(
+        ..., ge=10.0, le=200.0, description="Replacement cost of damaged component in ₹"
+    )
+    criticality: float = Field(
+        ..., ge=0.1, le=1.0, description="Risk score — higher means more critical circuit"
+    )
+    slots_free: int = Field(
+        ..., ge=0, description="Number of soldering slots currently available"
+    )
+    slots_state: List[int] = Field(
+        ..., description="Remaining time units for each soldering slot (0=free)"
+    )
+    is_anomaly: bool = Field(
+        False, description="True if this board exhibits rare/unusual characteristics"
+    )
+    anomaly_score: float = Field(
+        0.0, ge=0.0, le=1.0, description="Anomaly confidence (0=normal, 1=highly anomalous)"
+    )
+    step: int = Field(..., ge=0, description="Current step number in the episode")
+    task_id: str = Field(..., description="ID of the active task")
+    valid_actions: List[str] = Field(
+        ..., description="List of valid actions for this observation"
+    )
+    # --- Real-time metrics ---
+    rolling_accuracy: float = Field(
+        0.0, ge=0.0, le=1.0, description="Fraction of correct decisions so far"
+    )
+    throughput: float = Field(
+        0.0, ge=0.0, description="Boards processed per step so far"
+    )
+    cumulative_reward: float = Field(
+        0.0, description="Cumulative normalized reward so far in this episode"
+    )
+# ---------------------------
+# PCB ACTION
+# ---------------------------
+class PCBAction(BaseModel):
+    """Action submitted by an agent to the environment."""
+    action: Literal[
+        "PASS",
+        "SCRAP",
+        "ROUTE_COMPONENT_REPLACEMENT",
+        "ROUTE_SOLDERING",
+        "ROUTE_DIAGNOSTICS",
+        "WAIT",
+    ] = Field(..., description="Decision made for the current PCB")
+# ---------------------------
+# REWARD COMPONENTS
+# ---------------------------
+class RewardComponents(BaseModel):
+    """Decomposed reward signal for transparency and debugging."""
+    defect_reward: float = Field(
+        ..., description="Score for handling the defect correctly (0.0–1.0)"
+    )
+    cost_efficiency: float = Field(
+        ..., description="Economic value retained vs. lost (0.0–1.0)"
+    )
+    queue_penalty: float = Field(
+        ..., description="Penalty for creating factory bottlenecks (0.0–1.0, lower is worse)"
+    )
+    criticality_factor: float = Field(
+        ..., description="Risk-adjusted modifier based on criticality (0.0–1.0)"
+    )
+    anomaly_bonus: float = Field(
+        0.0, description="Bonus for correctly flagging/handling anomalous board (0.0–1.0)"
+    )
+    total_raw: float = Field(
+        ..., description="Weighted sum of all components before normalization"
+    )
+    normalized: float = Field(
+        ..., ge=0.0, le=1.0, description="Final normalized reward in [0.0, 1.0]"
+    )
+    explanation: str = Field(
+        ..., description="Human-readable explanation of why this reward was given"
+    )
+# ---------------------------
+# STEP RESULT
+# ---------------------------
+class StepResult(BaseModel):
+    """Full result returned by step() and reset()."""
+    observation: PCBObservation
+    reward: float = Field(
+        0.0, ge=0.0, le=1.0, description="Normalized reward for this step [0.0, 1.0]"
+    )
+    reward_components: Optional[RewardComponents] = Field(
+        None, description="Detailed breakdown of reward components"
+    )
+    done: bool = Field(..., description="True if the episode has ended")
+    info: Dict[str, Any] = Field(
+        default_factory=dict, description="Additional diagnostic info"
+    )
+# ---------------------------
+# TASK RESULT (for graders)
+# ---------------------------
+class TaskResult(BaseModel):
+    """Summary of a completed task run, consumed by graders."""
+    task_id: str
+    total_steps: int
+    rewards: List[float]                  # per-step normalized rewards
+    correct_decisions: int
+    total_decisions: int
+    bottleneck_count: int                  # times queue was maxed out
+    anomaly_total: int                     # how many anomaly boards appeared
+    anomaly_flagged: int                   # how many the agent correctly flagged
+    cumulative_raw_reward: float
+    max_possible_raw: float
+    final_score: float = 0.0              # filled by grader

src/reward.py ADDED Viewed

	@@ -0,0 +1,288 @@

+"""
+reward.py — SpectraQual Multi-Component Normalized Reward
+Replaces duplicated logic in env.py and old reward.py.
+Reward is decomposed into 5 components and normalized to [0.0, 1.0].
+This gives the agent a rich, non-sparse signal at every step.
+"""
+from __future__ import annotations
+import math
+from typing import Dict, Any, List
+from config import (
+    REWARD_WEIGHT_DEFECT,
+    REWARD_WEIGHT_COST,
+    REWARD_WEIGHT_QUEUE,
+    REWARD_WEIGHT_CRITICALITY,
+    REWARD_WEIGHT_ANOMALY,
+    COMPONENT_COST_MIN,
+    COMPONENT_COST_MAX,
+    ANOMALY_COST_THRESHOLD,
+    ANOMALY_CRITICALITY_THRESHOLD,
+)
+from models import RewardComponents
+# ---------------------------
+# NORMALIZATION HELPERS
+# ---------------------------
+def _sigmoid_normalize(x: float, scale: float = 0.025) -> float:
+    """Sigmoid-based normalization: output is always in (0, 1)."""
+    return 1.0 / (1.0 + math.exp(-scale * x))
+def _clamp(x: float, lo: float = 0.0, hi: float = 1.0) -> float:
+    return max(lo, min(hi, x))
+def _cost_fraction(cost: float) -> float:
+    """Normalize cost into [0, 1] range."""
+    return (cost - COMPONENT_COST_MIN) / (COMPONENT_COST_MAX - COMPONENT_COST_MIN)
+# ---------------------------
+# ANOMALY DETECTION
+# ---------------------------
+def detect_anomaly(pcb: Dict[str, Any]) -> tuple[bool, float]:
+    """
+    Flag a board as an anomaly if it has extreme cost AND high criticality.
+    Returns (is_anomaly, anomaly_score 0.0–1.0).
+    """
+    cost_flag     = pcb["component_cost"] >= ANOMALY_COST_THRESHOLD
+    critical_flag = pcb["criticality"] >= ANOMALY_CRITICALITY_THRESHOLD
+    if cost_flag and critical_flag:
+        # Combine both signals into a confidence score
+        cost_score     = _cost_fraction(pcb["component_cost"])
+        critical_score = pcb["criticality"]
+        anomaly_score  = _clamp(0.5 * cost_score + 0.5 * critical_score)
+        return True, anomaly_score
+    # Partial anomaly: one signal strong
+    if cost_flag or critical_flag:
+        score = _cost_fraction(pcb["component_cost"]) * 0.4 + pcb["criticality"] * 0.3
+        return False, _clamp(score)
+    return False, 0.0
+# ---------------------------
+# COMPONENT 1 — DEFECT REWARD
+# ---------------------------
+def _defect_component(defect: str, action: str) -> tuple[float, str]:
+    """
+    Score the correctness of the action given the defect type.
+    Returns (raw_score 0.0–1.0, explanation_fragment)
+    """
+    mapping = {
+        ("none",               "PASS"):                          (1.00, "Correct PASS on clean board"),
+        ("none",               "SCRAP"):                         (0.00, "Wasteful SCRAP on clean board"),
+        ("missing_component",  "ROUTE_COMPONENT_REPLACEMENT"):   (1.00, "Optimal route for missing component"),
+        ("missing_component",  "SCRAP"):                         (0.30, "Suboptimal SCRAP — value lost"),
+        ("solder_bridge",      "ROUTE_SOLDERING"):               (1.00, "Correct soldering route"),
+        ("solder_bridge",      "WAIT"):                          (0.40, "WAIT acceptable — preserves board"),
+        ("solder_bridge",      "SCRAP"):                         (0.10, "Poor choice — solder bridge is repairable"),
+        ("short_circuit",      "SCRAP"):                         (1.00, "Correct SCRAP for high-risk short circuit"),
+        ("short_circuit",      "ROUTE_DIAGNOSTICS"):             (0.80, "Diagnostics acceptable for low-risk short"),
+        ("short_circuit",      "PASS"):                          (0.00, "Dangerous PASS on short circuit"),
+    }
+    key = (defect, action)
+    if key in mapping:
+        score, expl = mapping[key]
+        return score, expl
+    # Any other invalid combination
+    return 0.05, f"Invalid action '{action}' for defect '{defect}'"
+# ---------------------------
+# COMPONENT 2 — COST EFFICIENCY
+# ---------------------------
+def _cost_component(defect: str, action: str, cost: float) -> tuple[float, str]:
+    """
+    Measure economic efficiency of the decision.
+    Returns (score 0.0–1.0, explanation_fragment)
+    """
+    cf = _cost_fraction(cost)
+    if defect == "none":
+        return (1.0, "No cost involved in PASS") if action == "PASS" else (0.5, "Unnecessary action cost")
+    if defect == "missing_component":
+        if action == "ROUTE_COMPONENT_REPLACEMENT":
+            # High-cost boards benefit more from repair
+            return (_clamp(0.5 + 0.5 * cf), f"Repair recovers {cf:.0%} of component value")
+        else:  # SCRAP
+            # Scrapping expensive boards wastes value
+            return (_clamp(1.0 - cf), f"Scrap wastes {cf:.0%} of component value")
+    if defect == "solder_bridge":
+        if action == "ROUTE_SOLDERING":
+            return (_clamp(0.6 + 0.3 * cf), "Soldering route recovers board value")
+        elif action == "WAIT":
+            return (0.45, "WAIT preserves board but delays throughput")
+        else:  # SCRAP
+            return (_clamp(0.5 - 0.4 * cf), "Scrapping repairable board is costly")
+    if defect == "short_circuit":
+        if action == "SCRAP":
+            return (0.80, "Scrapping avoids downstream failure cost")
+        elif action == "ROUTE_DIAGNOSTICS":
+            return (0.70, "Diagnostics adds some cost but recovers revenue")
+        else:
+            return (0.10, "Wrong action risks high downstream failure penalty")
+    return (0.3, "Unknown defect/action combination")
+# ---------------------------
+# COMPONENT 3 — QUEUE PENALTY
+# ---------------------------
+def _queue_component(action: str, slots_state: List[int]) -> tuple[float, str]:
+    """
+    Penalize bottleneck creation. Returns (score 0.0–1.0, explanation_fragment).
+    High score = no queue problem. Low score = bad queue usage.
+    """
+    free_slots = slots_state.count(0)
+    total_slots = len(slots_state)
+    if action == "ROUTE_SOLDERING":
+        if free_slots > 0:
+            utilization = 1.0 - (free_slots - 1) / total_slots
+            return (_clamp(0.6 + 0.4 * utilization),
+                    f"Soldering assigned to free slot ({free_slots - 1} remaining)")
+        else:
+            # All slots full → bottleneck
+            return (0.0, "BOTTLENECK: all soldering slots occupied")
+    if action == "WAIT":
+        if free_slots == 0:
+            return (0.55, "WAIT appropriate — no slot available")
+        else:
+            return (0.35, "Unnecessary WAIT — slots were available")
+    # Non-soldering actions don't stress the queue
+    occupancy_ratio = sum(1 for s in slots_state if s > 0) / total_slots
+    return (_clamp(1.0 - 0.2 * occupancy_ratio), "No queue impact from this action")
+# ---------------------------
+# COMPONENT 4 — CRITICALITY
+# ---------------------------
+def _criticality_component(defect: str, action: str, criticality: float) -> tuple[float, str]:
+    """
+    Risk-adjust the decision based on board criticality.
+    High-criticality wrong decisions are severely penalized.
+    """
+    # Optimal action scores well regardless of criticality
+    optimal = {
+        "none":              "PASS",
+        "missing_component": "ROUTE_COMPONENT_REPLACEMENT",
+        "solder_bridge":     "ROUTE_SOLDERING",
+        "short_circuit":     "SCRAP",
+    }
+    is_optimal = (optimal.get(defect) == action)
+    if is_optimal:
+        # Reward scales slightly with criticality — making the right call on risky boards is harder
+        return (_clamp(0.7 + 0.3 * criticality), f"Correct action on criticality={criticality:.2f} board")
+    if defect == "short_circuit" and action not in ("SCRAP", "ROUTE_DIAGNOSTICS"):
+        # Dangerous wrong action on high-criticality board
+        penalty = criticality
+        return (_clamp(1.0 - penalty), f"Risky action on critical short_circuit board (criticality={criticality:.2f})")
+    # Sub-optimal but not dangerous
+    return (_clamp(0.5 - 0.2 * criticality), f"Sub-optimal action with criticality={criticality:.2f}")
+# ---------------------------
+# COMPONENT 5 — ANOMALY BONUS
+# ---------------------------
+def _anomaly_component(is_anomaly: bool, action: str, defect: str) -> tuple[float, str]:
+    """
+    Bonus for handling anomalous boards correctly.
+    For inference.py the LLM can't explicitly 'flag' anomalies, so we reward
+    it for choosing the safest action on anomaly boards.
+    """
+    if not is_anomaly:
+        return (0.5, "Normal board — no anomaly bonus/penalty")
+    # Best safe action on anomaly board
+    safe_actions = {
+        "none":              "PASS",
+        "missing_component": "ROUTE_COMPONENT_REPLACEMENT",
+        "solder_bridge":     "ROUTE_SOLDERING",
+        "short_circuit":     "SCRAP",
+    }
+    if action == safe_actions.get(defect):
+        return (1.0, "Correct safe action on anomaly board — BONUS")
+    elif action == "SCRAP":
+        return (0.6, "Conservative SCRAP on anomaly board")
+    else:
+        return (0.1, "Risky action on anomaly board — PENALTY")
+# ---------------------------
+# MASTER REWARD CALCULATOR
+# ---------------------------
+def calculate_reward(
+    pcb: Dict[str, Any],
+    action: str,
+    slots_state: List[int],
+    is_anomaly: bool = False,
+) -> RewardComponents:
+    """
+    Compute multi-component normalized reward for a (pcb, action) pair.
+    Args:
+        pcb:         dict with defect_type, component_cost, criticality
+        action:      one of the 6 valid action strings
+        slots_state: list of slot remaining times, e.g. [0, 2, 0]
+        is_anomaly:  whether this board was flagged as anomalous
+    Returns:
+        RewardComponents with individual scores and final normalized reward.
+    """
+    defect      = pcb["defect_type"]
+    cost        = pcb["component_cost"]
+    criticality = pcb["criticality"]
+    # Compute each component
+    d_score, d_expl = _defect_component(defect, action)
+    c_score, c_expl = _cost_component(defect, action, cost)
+    q_score, q_expl = _queue_component(action, slots_state)
+    r_score, r_expl = _criticality_component(defect, action, criticality)
+    a_score, a_expl = _anomaly_component(is_anomaly, action, defect)
+    # Weighted sum
+    raw = (
+        REWARD_WEIGHT_DEFECT      * d_score +
+        REWARD_WEIGHT_COST        * c_score +
+        REWARD_WEIGHT_QUEUE       * q_score +
+        REWARD_WEIGHT_CRITICALITY * r_score +
+        REWARD_WEIGHT_ANOMALY     * a_score
+    )
+    normalized = _clamp(raw)
+    # Build explanation
+    parts = [
+        f"[Defect {d_score:.2f}] {d_expl}",
+        f"[Cost {c_score:.2f}] {c_expl}",
+        f"[Queue {q_score:.2f}] {q_expl}",
+        f"[Risk {r_score:.2f}] {r_expl}",
+    ]
+    if is_anomaly:
+        parts.append(f"[Anomaly {a_score:.2f}] {a_expl}")
+    explanation = " | ".join(parts)
+    return RewardComponents(
+        defect_reward=round(d_score, 4),
+        cost_efficiency=round(c_score, 4),
+        queue_penalty=round(q_score, 4),
+        criticality_factor=round(r_score, 4),
+        anomaly_bonus=round(a_score, 4),
+        total_raw=round(raw, 4),
+        normalized=round(normalized, 4),
+        explanation=explanation,
+    )

src/tasks.py ADDED Viewed

	@@ -0,0 +1,262 @@

+"""
+tasks.py — SpectraQual Task Definitions and Programmatic Graders
+Each task runs the environment with a fixed seed and scores the agent 0.0–1.0.
+Graders are deterministic and reproducible.
+"""
+from __future__ import annotations
+import sys
+import os
+from typing import List
+sys.path.insert(0, os.path.dirname(__file__))
+from config import (
+    TASKS,
+    MEDIUM_ECONOMIC_TARGET,
+    HARD_ANOMALY_RATE_TARGET,
+    SUCCESS_SCORE_THRESHOLD,
+)
+from models import TaskResult
+from env import SpectraQualEnv
+from models import PCBAction
+# ---------------------------
+# TASK RUNNER
+# ---------------------------
+def run_task(task_id: str, actions: List[str]) -> TaskResult:
+    """
+    Run a task with a pre-determined list of actions.
+    Used by graders to replay an agent's trajectory deterministically.
+    Args:
+        task_id: one of "task_easy", "task_medium", "task_hard"
+        actions:  list of action strings, one per step
+    Returns:
+        TaskResult with all episode metrics filled in.
+    """
+    cfg = TASKS[task_id]
+    env = SpectraQualEnv(task_id=task_id)
+    env.reset()
+    rewards:    List[float] = []
+    correct     = 0
+    total       = 0
+    bottlenecks = 0
+    anomaly_total   = 0
+    anomaly_flagged = 0
+    cum_raw     = 0.0
+    for i, action_str in enumerate(actions):
+        if env._done:
+            break
+        # Default to SCRAP if action is out of valid range
+        valid = env._current_pcb and env._current_pcb.get("defect_type")
+        try:
+            result = env.step(PCBAction(action=action_str))
+        except Exception:
+            result = env.step(PCBAction(action="SCRAP"))
+        rewards.append(result.reward)
+        total += 1
+        if result.info.get("is_anomaly"):
+            anomaly_total += 1
+        if result.reward_components:
+            cum_raw += result.reward_components.total_raw
+            if result.info.get("is_anomaly") and result.reward_components.anomaly_bonus >= 0.8:
+                anomaly_flagged += 1
+        if env._is_correct(result.info.get("defect", ""), action_str):
+            correct += 1
+        bottlenecks = env._bottleneck_cnt
+    max_possible_raw = cfg["n_boards"] * 1.0  # max normalized = 1.0 per step
+    return TaskResult(
+        task_id=task_id,
+        total_steps=total,
+        rewards=rewards,
+        correct_decisions=correct,
+        total_decisions=total,
+        bottleneck_count=bottlenecks,
+        anomaly_total=anomaly_total,
+        anomaly_flagged=anomaly_flagged,
+        cumulative_raw_reward=cum_raw,
+        max_possible_raw=max_possible_raw,
+    )
+# ---------------------------
+# GRADER: TASK EASY
+# ---------------------------
+def grade_easy(result: TaskResult) -> float:
+    """
+    Task Easy Grader.
+    Objective: Correctly classify all defect types. No slot pressure.
+    Scoring: correct_decisions / total_decisions → 0.0–1.0
+    Also gives partial credit for near-correct results:
+    - 100% correct = 1.0
+    - 80% correct  = 0.8
+    - 0% correct   = 0.0
+    """
+    if result.total_decisions == 0:
+        return 0.0
+    accuracy = result.correct_decisions / result.total_decisions
+    # Blend accuracy with average reward for robustness
+    avg_reward = sum(result.rewards) / len(result.rewards) if result.rewards else 0.0
+    # Weight: 70% accuracy, 30% reward quality
+    score = 0.70 * accuracy + 0.30 * avg_reward
+    return round(min(max(score, 0.0), 1.0), 4)
+# ---------------------------
+# GRADER: TASK MEDIUM
+# ---------------------------
+def grade_medium(result: TaskResult) -> float:
+    """
+    Task Medium Grader.
+    Objective: Triage 15 boards with 1 slot (queue pressure).
+    Scoring: 0.6 * economic_efficiency + 0.4 * bottleneck_avoidance
+    - economic_efficiency: avg normalized reward vs target
+    - bottleneck_avoidance: 1.0 if no bottlenecks, scales down to 0
+    """
+    if not result.rewards:
+        return 0.0
+    avg_reward = sum(result.rewards) / len(result.rewards)
+    # Economic efficiency: how close to target (MEDIUM_ECONOMIC_TARGET = 0.50)
+    economic_score = min(avg_reward / MEDIUM_ECONOMIC_TARGET, 1.0)
+    # Bottleneck avoidance: 0 bottleneck = 1.0, ≥5 = 0.0
+    max_tolerable_bottlenecks = 5
+    bottleneck_score = max(0.0, 1.0 - result.bottleneck_count / max_tolerable_bottlenecks)
+    score = 0.60 * economic_score + 0.40 * bottleneck_score
+    return round(min(max(score, 0.0), 1.0), 4)
+# ---------------------------
+# GRADER: TASK HARD
+# ---------------------------
+def grade_hard(result: TaskResult) -> float:
+    """
+    Task Hard Grader.
+    Objective: 20 boards, mixed anomalies, tight slots.
+    Scoring: 0.5 * anomaly_score + 0.3 * economic_score + 0.2 * throughput_score
+    - anomaly_score:    anomaly_flagged / max(anomaly_total, 1), target ≥ 0.5
+    - economic_score:   avg normalized reward
+    - throughput_score: boards_processed / total (penalizes WAIT spam)
+    """
+    if not result.rewards:
+        return 0.0
+    cfg = TASKS["task_hard"]
+    avg_reward = sum(result.rewards) / len(result.rewards)
+    # Anomaly score: did the agent handle anomalous boards correctly?
+    if result.anomaly_total > 0:
+        raw_anomaly = result.anomaly_flagged / result.anomaly_total
+    else:
+        raw_anomaly = 1.0  # no anomalies → not penalized
+    # Scale anomaly score: meeting HARD_ANOMALY_RATE_TARGET = 1.0
+    anomaly_score = min(raw_anomaly / HARD_ANOMALY_RATE_TARGET, 1.0)
+    # Economic score
+    economic_score = avg_reward
+    # Throughput: penalize excessive WAIT actions
+    throughput_score = min(result.total_decisions / cfg["n_boards"], 1.0)
+    score = (
+        0.50 * anomaly_score +
+        0.30 * economic_score +
+        0.20 * throughput_score
+    )
+    return round(min(max(score, 0.0), 1.0), 4)
+# ---------------------------
+# GRADER DISPATCH
+# ---------------------------
+GRADERS = {
+    "task_easy":   grade_easy,
+    "task_medium": grade_medium,
+    "task_hard":   grade_hard,
+}
+def grade(task_id: str, result: TaskResult) -> float:
+    """Dispatch to the correct grader for the given task_id."""
+    if task_id not in GRADERS:
+        raise ValueError(f"No grader for task_id='{task_id}'")
+    return GRADERS[task_id](result)
+# ---------------------------
+# TASK DESCRIPTIONS (for README / inference prompt)
+# ---------------------------
+TASK_DESCRIPTIONS = {
+    "task_easy": (
+        "Triage 10 PCBs with no factory slot pressure. "
+        "Focus: identify the correct action for each defect type. "
+        "Grader: accuracy-weighted reward (70% accuracy + 30% reward quality). "
+        "Expected frontier model score: ≥0.85."
+    ),
+    "task_medium": (
+        "Triage 15 PCBs with only 1 active soldering slot. "
+        "Focus: manage queue pressure while maintaining economic performance. "
+        "Grader: 60% economic efficiency + 40% bottleneck avoidance. "
+        "Expected frontier model score: ≥0.65."
+    ),
+    "task_hard": (
+        "Triage 20 PCBs with 25% anomaly rate and tight slot constraints. "
+        "Focus: handle extreme-cost/criticality boards safely AND maintain throughput. "
+        "Grader: 50% anomaly handling + 30% economic score + 20% throughput. "
+        "Expected frontier model score: ≥0.50."
+    ),
+}
+# ---------------------------
+# CLI TEST UTILITY
+# ---------------------------
+if __name__ == "__main__":
+    """Quick sanity check: run all 3 tasks with a rule-based agent."""
+    from env import SpectraQualEnv, decide_action
+    from models import PCBAction
+    print("\n=== SpectraQual Task Grader Sanity Check ===\n")
+    for tid in ["task_easy", "task_medium", "task_hard"]:
+        env = SpectraQualEnv(task_id=tid)
+        result_obj = env.reset()
+        actions = []
+        while not result_obj.done:
+            obs = result_obj.observation
+            pcb = {
+                "defect_type":    obs.defect_type,
+                "component_cost": obs.component_cost,
+                "criticality":    obs.criticality,
+            }
+            action_str  = decide_action(pcb)
+            actions.append(action_str)
+            result_obj  = env.step(PCBAction(action=action_str))
+        task_result = run_task(tid, actions)
+        score       = grade(tid, task_result)
+        print(f"[{tid}] Score: {score:.4f} | Correct: {task_result.correct_decisions}/{task_result.total_decisions} | Bottlenecks: {task_result.bottleneck_count}")
+    print("\n=== Done ===")

src/train.py ADDED Viewed

	@@ -0,0 +1,31 @@

+from env import generate_pcb, calculate_reward, update_factory, factory
+from agent import get_state, choose_action, update_q
+EPISODES = 500
+STEPS_PER_EPISODE = 20   # multi-step episodes
+for ep in range(EPISODES):
+    factory["soldering_slots"] = [0, 0, 0]
+    pcb = generate_pcb()
+    state = get_state(pcb, factory)
+    for step in range(STEPS_PER_EPISODE):
+        action = choose_action(state)
+        update_factory()
+        reward = calculate_reward(pcb, action)
+        next_pcb = generate_pcb()
+        next_state = get_state(next_pcb, factory)
+        update_q(state, action, reward, next_state)
+        # move forward
+        pcb = next_pcb
+        state = next_state
+print("Training Complete")

verify.py ADDED Viewed

	@@ -0,0 +1,36 @@

+import sys
+sys.path.insert(0, 'src')
+from config import TASKS, ACTIONS, VALID_ACTIONS
+from models import PCBObservation, PCBAction, RewardComponents, StepResult
+from reward import calculate_reward, detect_anomaly
+from env import SpectraQualEnv
+print("--- Module imports: OK ---")
+# Test reset and step
+env = SpectraQualEnv("task_easy")
+r = env.reset()
+print(f"reset() -> defect={r.observation.defect_type}, step={r.observation.step}, done={r.done}")
+action = r.observation.valid_actions[0]
+r2 = env.step(PCBAction(action=action))
+print(f"step({action}) -> reward={r2.reward:.4f}, done={r2.done}")
+print(f"  expl: {r2.reward_components.explanation[:80]}")
+state = env.state()
+print(f"state() -> step={state['step']}, accuracy={state['rolling_accuracy']}")
+# Test all 3 tasks
+for tid in ["task_easy", "task_medium", "task_hard"]:
+    e = SpectraQualEnv(task_id=tid)
+    rr = e.reset()
+    steps = 0
+    while not rr.done and steps < 30:
+        action_str = rr.observation.valid_actions[0]
+        rr = e.step(PCBAction(action=action_str))
+        steps += 1
+    s = e.state()
+    print(f"[{tid}] steps={steps}, cum_reward={s['cumulative_reward']:.4f}, accuracy={s['rolling_accuracy']:.2%}")
+print("--- All tests: PASS ---")