Spaces:

Ajay00747
/

CyberSOC

Sleeping

App Files Files Community

Ajayyy00 commited on Apr 3

Commit

bb0d7fd

1 Parent(s): f6c80b9

Initial commit: CyberSOC Enterprise Environment Baseline

Browse files

Files changed (37) hide show

Dockerfile +19 -0
README.md +44 -7
__init__.py +33 -0
__pycache__/__init__.cpython-311.pyc +0 -0
__pycache__/client.cpython-311.pyc +0 -0
__pycache__/inference.cpython-311.pyc +0 -0
__pycache__/models.cpython-311.pyc +0 -0
client.py +99 -0
demo_scripted.py +148 -0
eval_100.py +90 -0
inference.py +322 -0
models.py +333 -0
openenv.yaml +0 -0
openenv_play.egg-info/PKG-INFO +9 -0
openenv_play.egg-info/SOURCES.txt +14 -0
openenv_play.egg-info/dependency_links.txt +1 -0
openenv_play.egg-info/entry_points.txt +2 -0
openenv_play.egg-info/requires.txt +5 -0
openenv_play.egg-info/top_level.txt +1 -0
pyproject.toml +36 -0
requirements.txt +7 -0
server/Dockerfile +80 -0
server/__init__.py +11 -0
server/__pycache__/__init__.cpython-311.pyc +0 -0
server/__pycache__/app.cpython-311.pyc +0 -0
server/__pycache__/graders.cpython-311.pyc +0 -0
server/__pycache__/play_environment.cpython-311.pyc +0 -0
server/__pycache__/task_generator.cpython-311.pyc +0 -0
server/__pycache__/tasks.cpython-311.pyc +0 -0
server/app.py +62 -0
server/graders.py +212 -0
server/play_environment.py +594 -0
server/requirements.txt +3 -0
server/task_generator.py +627 -0
server/tasks.py +513 -0
uv.lock +0 -0
validate_submission.sh +159 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,19 @@

+FROM python:3.10-slim
+# Create user with home directory
+RUN useradd -m -u 1000 user
+USER user
+ENV PATH="/home/user/.local/bin:$PATH"
+WORKDIR /app
+# Copy requirements and install
+COPY --chown=user ./requirements.txt requirements.txt
+RUN pip install --no-cache-dir --upgrade -r requirements.txt
+# Copy all environment files
+COPY --chown=user . /app
+# The hackathon expects the OpenEnv Server to run on 7860 for Spaces Gradio endpoints
+# We will use uvicorn to host the app which complies with the spec
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,11 +1,48 @@
 ---
-title: CyberSOC
-emoji: 👁
-colorFrom: yellow
-colorTo: green
 sdk: docker
-pinned: false
-license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: CyberSOC Enterprise Environment
+emoji: 🛡️
+colorFrom: blue
+colorTo: indigo
 sdk: docker
+app_port: 7860
 ---
+# CyberSOC: Enterprise Network Defense Environment 🛡️
+CyberSOC is a highly scalable, production-grade OpenEnv reinforcement learning environment designed to evaluate AI agents on their ability to perform Incident Response across a 500-node enterprise network.
+## 🌟 Hackathon Highlights
+This is not a toy benchmark. This environment models real-world enterprise infrastructure:
+1. **Massive Procedural Variety (1,000 Tasks):**
+   Instead of hardcoded puzzles, CyberSOC features a seed-based procedural generation engine. We dynamically spin up **1000 unique network topologies** containing a mix of 12 distinct attack vectors (from Supply Chain to Ransomware). This guarantees that agents cannot overfit.
+2. **Dense, Business-Aligned Grading:**
+   Unlike simple pass/fail benchmarks, CyberSOC uses intelligent reward shaping. Agents earn rewards for hunting down malicious processes and blocking IOCs mid-investigation. However, they are heavily penalized for increasing "Business Downtime" (quarantining healthy subnets haphazardly). They must balance security guarantees with business continuity.
+3. **Complex State & Action Space:**
+   Agents must use structured tools (Pydantic models) to traverse the environment:
+   - `query_host`: Map the active topology.
+   - `run_forensics`: Scrape memory and process lists.
+   - `kill_process` & `block_ioc`: Perform active containment.
+   - `isolate_segment`: Implement extreme fail-safes.
+   - `submit_containment_plan`: Formulate a final executive overview.
+4. **Flawless Inference Benchmarking:**
+   The included `inference.py` provides an out-of-the-box evaluation loop. We have successfully benchmarked state-of-the-art LLMs (like Qwen2.5-72B and Llama-3.3-70B) natively within this environment using standard OpenAI/Groq clients.
+## 🚀 Running the Environment
+This repository is fully packaged as a Docker container.
+### Local Execution:
+```bash
+python inference.py
+```
+### Agent Configuration
+To run your own agent, define:
+`API_KEY` - Your LLM token
+`API_BASE_URL` - The endpoint you are hitting
+`MODEL_NAME` - Target identifier

__init__.py ADDED Viewed

	@@ -0,0 +1,33 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""CyberSOCEnv — Enterprise Cybersecurity Operations Center Environment."""
+from .client import CyberSOCClient
+from .models import (
+    SOCObservation,
+    SOCActionWrapper,
+    SOCState,
+    QueryHost,
+    IsolateSegment,
+    BlockIOC,
+    RunForensics,
+    KillProcess,
+    SubmitContainmentPlan,
+)
+__all__ = [
+    "CyberSOCClient",
+    "SOCObservation",
+    "SOCActionWrapper",
+    "SOCState",
+    "QueryHost",
+    "IsolateSegment",
+    "BlockIOC",
+    "RunForensics",
+    "KillProcess",
+    "SubmitContainmentPlan",
+]

__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (717 Bytes). View file

__pycache__/client.cpython-311.pyc ADDED Viewed

Binary file (5.1 kB). View file

__pycache__/inference.cpython-311.pyc ADDED Viewed

Binary file (15.7 kB). View file

__pycache__/models.cpython-311.pyc ADDED Viewed

Binary file (20.5 kB). View file

client.py ADDED Viewed

	@@ -0,0 +1,99 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""CyberSOCEnv Client — connects to the SOC environment server."""
+from typing import Dict
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+from .models import (
+    SOCObservation,
+    SOCActionWrapper,
+    SOCState,
+    Alert,
+    Severity,
+    ThreatType,
+    NetworkTopology,
+    ForensicsResult,
+    TimelineEntry,
+)
+class CyberSOCClient(
+    EnvClient[SOCActionWrapper, SOCObservation, SOCState]
+):
+    """
+    Client for the CyberSOCEnv environment.
+    Connects via WebSocket to the SOC environment server for
+    low-latency, persistent-session interaction.
+    Example:
+        >>> with CyberSOCClient(base_url="http://localhost:8000") as client:
+        ...     result = client.reset()
+        ...     print(result.observation.alert_queue)
+        ...
+        ...     from play.models import QueryHost
+        ...     result = client.step(SOCActionWrapper(type="query_host", hostname="WS-001"))
+        ...     print(result.observation.host_forensics)
+    """
+    def _step_payload(self, action: SOCActionWrapper) -> Dict:
+        """Convert SOCActionWrapper to JSON payload for step message."""
+        return action.model_dump(exclude_none=True)
+    def _parse_result(self, payload: Dict) -> StepResult[SOCObservation]:
+        """Parse server response into StepResult[SOCObservation]."""
+        obs_data = payload.get("observation", {})
+        # Parse alerts
+        alerts = [Alert(**a) for a in obs_data.get("alert_queue", [])]
+        # Parse network topology
+        topo_data = obs_data.get("network_topology", {})
+        topology = NetworkTopology(**topo_data) if topo_data else NetworkTopology()
+        # Parse forensics (may be None)
+        forensics_data = obs_data.get("host_forensics")
+        forensics = ForensicsResult(**forensics_data) if forensics_data else None
+        # Parse timeline
+        timeline = [TimelineEntry(**t) for t in obs_data.get("timeline", [])]
+        observation = SOCObservation(
+            alert_queue=alerts,
+            network_topology=topology,
+            host_forensics=forensics,
+            timeline=timeline,
+            business_impact_score=obs_data.get("business_impact_score", 0.0),
+            step_count=obs_data.get("step_count", 0),
+            active_threats=obs_data.get("active_threats", []),
+            max_steps=obs_data.get("max_steps", 30),
+            task_id=obs_data.get("task_id", "easy"),
+            total_reward=obs_data.get("total_reward", 0.0),
+            final_score=obs_data.get("final_score"),
+            grade_breakdown=obs_data.get("grade_breakdown"),
+            done=payload.get("done", False),
+            reward=payload.get("reward"),
+        )
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> SOCState:
+        """Parse server response into SOCState."""
+        return SOCState(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+            task_id=payload.get("task_id", "easy"),
+            total_reward=payload.get("total_reward", 0.0),
+            business_impact=payload.get("business_impact", 0.0),
+        )

demo_scripted.py ADDED Viewed

	@@ -0,0 +1,148 @@

+#!/usr/bin/env python3
+"""
+Scripted CyberSOCEnv Demo — runs optimal actions for all 3 tasks
+without requiring an LLM. Demonstrates the full environment pipeline.
+"""
+import asyncio
+import json
+import websockets
+WS_URL = "ws://127.0.0.1:8000/ws"
+# Pre-scripted optimal action sequences for each task
+SCRIPTS = {
+    "easy": [
+        {"type": "query_host", "hostname": "WS-042"},
+        {"type": "run_forensics", "hostname": "WS-042"},
+        {"type": "kill_process", "hostname": "WS-042", "process_name": "cryptolocker.exe"},
+        {"type": "block_ioc", "ioc_value": "e99a18c428cb38d5f260853678922e03", "ioc_type": "hash"},
+        {"type": "submit_containment_plan",
+         "plan": [{"threat_id": "T-EASY-001", "actions_taken": ["killed cryptolocker.exe", "blocked hash", "ran forensics"], "root_cause": "ransomware via user download", "confidence": 0.95}],
+         "executive_summary": "Single ransomware on WS-042 fully contained. Process killed, IOC blocked."},
+    ],
+    "medium": [
+        {"type": "run_forensics", "hostname": "WS-017"},
+        {"type": "kill_process", "hostname": "WS-017", "process_name": "powershell.exe"},
+        {"type": "kill_process", "hostname": "WS-017", "process_name": "mimikatz.exe"},
+        {"type": "block_ioc", "ioc_value": "evil-login.example.com", "ioc_type": "domain"},
+        {"type": "block_ioc", "ioc_value": "d41d8cd98f00b204e9800998ecf8427e", "ioc_type": "hash"},
+        {"type": "run_forensics", "hostname": "DEV-033"},
+        {"type": "kill_process", "hostname": "DEV-033", "process_name": "svchost_backdoor.exe"},
+        {"type": "run_forensics", "hostname": "FIN-012"},
+        {"type": "kill_process", "hostname": "FIN-012", "process_name": "svchost_backdoor.exe"},
+        {"type": "block_ioc", "ioc_value": "203.0.113.50", "ioc_type": "ip"},
+        {"type": "block_ioc", "ioc_value": "aabbccdd11223344eeff5566778899aa", "ioc_type": "hash"},
+        {"type": "block_ioc", "ioc_value": "112233445566778899aabbccddeeff00", "ioc_type": "hash"},
+        {"type": "submit_containment_plan",
+         "plan": [
+             {"threat_id": "T-MED-001", "actions_taken": ["killed powershell.exe", "blocked evil-login.example.com"], "root_cause": "phishing email with macro", "confidence": 0.9},
+             {"threat_id": "T-MED-002", "actions_taken": ["killed mimikatz.exe", "blocked hash"], "root_cause": "credential theft via Mimikatz", "confidence": 0.95},
+             {"threat_id": "T-MED-003", "actions_taken": ["killed svchost_backdoor on DEV-033 and FIN-012", "blocked C2 IP"], "root_cause": "lateral movement using stolen creds", "confidence": 0.9},
+         ],
+         "executive_summary": "Multi-stage attack contained: phishing -> cred theft -> lateral movement across 3 hosts."},
+    ],
+    "hard": [
+        {"type": "block_ioc", "ioc_value": "198.51.100.77", "ioc_type": "ip"},
+        {"type": "block_ioc", "ioc_value": "cdn-update.malware-c2.net", "ioc_type": "domain"},
+        {"type": "run_forensics", "hostname": "EXEC-003"},
+        {"type": "kill_process", "hostname": "EXEC-003", "process_name": "outlook_macro.exe"},
+        {"type": "kill_process", "hostname": "EXEC-003", "process_name": "svchost_c2.exe"},
+        {"type": "run_forensics", "hostname": "WS-088"},
+        {"type": "kill_process", "hostname": "WS-088", "process_name": "svchost_c2.exe"},
+        {"type": "run_forensics", "hostname": "SRV-002"},
+        {"type": "kill_process", "hostname": "SRV-002", "process_name": "exploit_kernel.exe"},
+        {"type": "kill_process", "hostname": "SRV-002", "process_name": "data_pump.exe"},
+        {"type": "block_ioc", "ioc_value": "203.0.113.99", "ioc_type": "ip"},
+        {"type": "block_ioc", "ioc_value": "exfil.malware-c2.net", "ioc_type": "domain"},
+        {"type": "run_forensics", "hostname": "FIN-008"},
+        {"type": "kill_process", "hostname": "FIN-008", "process_name": "data_pump.exe"},
+        {"type": "run_forensics", "hostname": "SRV-010"},
+        {"type": "kill_process", "hostname": "SRV-010", "process_name": "blackcat_ransom.exe"},
+        {"type": "kill_process", "hostname": "SRV-015", "process_name": "blackcat_ransom.exe"},
+        {"type": "block_ioc", "ioc_value": "deadbeef0123456789abcdef01234567", "ioc_type": "hash"},
+        {"type": "block_ioc", "ioc_value": "cafebabe9876543210fedcba98765432", "ioc_type": "hash"},
+        {"type": "submit_containment_plan",
+         "plan": [
+             {"threat_id": "T-HARD-001", "actions_taken": ["killed outlook_macro.exe", "blocked C2"], "root_cause": "spearphishing executive VP", "confidence": 0.95},
+             {"threat_id": "T-HARD-002", "actions_taken": ["killed svchost_c2.exe on 2 hosts", "blocked C2 domains"], "root_cause": "C2 beaconing via encrypted channel", "confidence": 0.9},
+             {"threat_id": "T-HARD-003", "actions_taken": ["killed exploit_kernel.exe"], "root_cause": "kernel exploit for privilege escalation on SRV-002", "confidence": 0.9},
+             {"threat_id": "T-HARD-004", "actions_taken": ["killed data_pump.exe on SRV-002 and FIN-008", "blocked exfil IP/domain"], "root_cause": "data exfiltration of PII and financial records", "confidence": 0.85},
+             {"threat_id": "T-HARD-005", "actions_taken": ["killed blackcat_ransom.exe on SRV-010 and SRV-015"], "root_cause": "BlackCat ransomware deployment on production storage", "confidence": 0.95},
+         ],
+         "executive_summary": "APT campaign fully contained: initial access via exec phishing, C2 cut, privilege escalation stopped, exfiltration blocked, ransomware neutralized."},
+    ],
+}
+async def run_task(task_id: str):
+    """Run a single task with scripted optimal actions."""
+    print(f"\n{'='*60}")
+    print(f"[START] task={task_id} env=cybersocenv model=scripted-optimal")
+    print(f"{'='*60}")
+    async with websockets.connect(WS_URL) as ws:
+        # Reset
+        await ws.send(json.dumps({"type": "reset", "data": {"task_id": task_id}}))
+        resp = json.loads(await ws.recv())
+        data = resp.get("data", {})
+        obs = data.get("observation", {})
+        print(f"  Reset: alerts={len(obs.get('alert_queue', []))}, threats={obs.get('active_threats', [])}")
+        # Execute scripted actions
+        rewards = []
+        for i, action in enumerate(SCRIPTS[task_id], 1):
+            await ws.send(json.dumps({"type": "step", "data": action}))
+            resp = json.loads(await ws.recv())
+            if resp.get("type") == "error":
+                print(f"  [STEP] step={i} action={action['type']} ERROR: {resp.get('data', {})}")
+                continue
+            data = resp.get("data", {})
+            obs = data.get("observation", {})
+            reward = data.get("reward", 0)
+            done = data.get("done", False)
+            rewards.append(reward)
+            print(f"  [STEP] step={i} action={action['type']} reward={reward:.2f} done={done}")
+            if done:
+                score = obs.get("final_score") or 0.0
+                breakdown = obs.get("grade_breakdown") or {}
+                print(f"\n  FINAL SCORE: {score:.4f}")
+                if breakdown:
+                    print(f"  Threats contained: {breakdown.get('threats_contained')}/{breakdown.get('total_threats')}")
+                    print(f"  IOCs blocked: {breakdown.get('iocs_blocked')}")
+                    print(f"  Hosts forensics: {breakdown.get('hosts_forensics')}")
+                    print(f"  Business impact: {breakdown.get('business_impact'):.4f}")
+                total_r = sum(rewards)
+                rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+                print(f"  [END] success={score > 0.5} steps={i} score={score:.3f} rewards={rewards_str}")
+                break
+        await ws.send(json.dumps({"type": "close"}))
+    return score if 'score' in dir() else 0.0
+async def main():
+    print("# CyberSOCEnv Scripted Optimal Agent Demo")
+    print("# This runs pre-computed optimal actions for all 3 tasks")
+    print("# to demonstrate the environment and grading pipeline.\n")
+    scores = {}
+    for task_id in ["easy", "medium", "hard"]:
+        score = await run_task(task_id)
+        scores[task_id] = score
+    print(f"\n{'='*60}")
+    print("# FINAL RESULTS")
+    print(f"{'='*60}")
+    for tid, s in scores.items():
+        print(f"  {tid:8s}: {s:.4f}")
+    avg = sum(scores.values()) / len(scores)
+    print(f"\n  Average: {avg:.4f}")
+if __name__ == "__main__":
+    asyncio.run(main())

eval_100.py ADDED Viewed

	@@ -0,0 +1,90 @@

+#!/usr/bin/env python3
+import asyncio
+import json
+import os
+import sys
+from openai import OpenAI
+from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type
+# Ensure we can import from play
+sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
+from play.inference import run_episode, API_BASE_URL, API_KEY, MODEL_NAME
+from play.server.task_generator import list_generated_task_ids
+RESULTS_FILE = "d:\\MetaNew\\play\\eval_results_100.json"
+@retry(
+    wait=wait_exponential(multiplier=2, min=5, max=120),
+    stop=stop_after_attempt(10),
+    retry=retry_if_exception_type(Exception)
+)
+async def resilient_run_episode(client, task_id):
+    try:
+        return await run_episode(client, task_id)
+    except Exception as exc:
+        if "429" in str(exc) or "RateLimit" in str(exc):
+            print(f"\n[!] Rate limit hit for {task_id}. Backing off... ({exc})\n", flush=True)
+            raise  # Trigger tenacity retry
+        print(f"\n[!] Unknown error directly in resilient wrapper: {exc}\n", flush=True)
+        raise
+async def main():
+    print(f"=== Starting Batch Evaluation (100 Tasks) ===", flush=True)
+    print(f"Model: {MODEL_NAME} | Endpoint: {API_BASE_URL}", flush=True)
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    # Load checkpoint
+    results = {}
+    if os.path.exists(RESULTS_FILE):
+        with open(RESULTS_FILE, "r") as f:
+            results = json.load(f)
+            print(f"Loaded {len(results)} existing results from checkpoint.", flush=True)
+    tasks = list_generated_task_ids(100)
+    success_count = 0
+    total_score = 0.0
+    for i, task_id in enumerate(tasks):
+        if task_id in results:
+            score = results[task_id]["score"]
+            if results[task_id]["success"]:
+                success_count += 1
+            total_score += score
+            continue
+        print(f"\n--- Evaluating {i+1}/100: {task_id} ---", flush=True)
+        try:
+            success, steps, score, rewards = await resilient_run_episode(client, task_id)
+            results[task_id] = {
+                "success": success,
+                "steps": steps,
+                "score": score,
+                "rewards": rewards
+            }
+            if success:
+                success_count += 1
+            total_score += score
+            # Save checkpoint
+            with open(RESULTS_FILE, "w") as f:
+                json.dump(results, f, indent=2)
+        except Exception as e:
+            print(f"\n[FATAL] FAILED TO EVALUATE {task_id} AFTER RETRIES. Error: {e}", flush=True)
+            break
+    completed = len(results)
+    if completed > 0:
+        print(f"\n=== FINAL EVALUATION SUMMARY ===", flush=True)
+        print(f"Tasks Completed: {completed}/100", flush=True)
+        print(f"Success Rate:    {success_count}/{completed} ({(success_count/completed)*100:.1f}%)", flush=True)
+        print(f"Average Score:   {total_score/completed:.3f}", flush=True)
+if __name__ == "__main__":
+    asyncio.run(main())

inference.py ADDED Viewed

	@@ -0,0 +1,322 @@

+#!/usr/bin/env python3
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+CyberSOCEnv Baseline Inference Script.
+HACKATHON RULES:
+  - File must be named inference.py in the project root
+  - Must use OpenAI Client for all LLM calls
+  - Must emit structured stdout logs: [START], [STEP], [END]
+  - Runtime < 20 minutes
+  - Must work on vcpu=2, memory=8gb
+Environment Variables:
+    API_BASE_URL  - The API endpoint for the LLM
+    MODEL_NAME    - The model identifier to use for inference
+    HF_TOKEN      - Your Hugging Face / API key
+"""
+import asyncio
+import json
+import os
+import textwrap
+from typing import Any, Dict, List, Optional
+from openai import OpenAI
+from play.models import SOCActionWrapper, SOCObservation
+from play.server.play_environment import CyberSOCEnvironment
+# =============================================================================
+# Configuration (from environment variables)
+# =============================================================================
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY") or ""
+API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
+MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
+BENCHMARK = "cybersocenv"
+TASKS = ["easy", "medium", "hard"]
+MAX_STEPS = {"easy": 15, "medium": 25, "hard": 30}
+TEMPERATURE = 0.1
+MAX_TOKENS = 1024
+# Scoring: normalize rewards to [0, 1]
+MAX_POSSIBLE_REWARD = 2.0  # Approximate max reward per episode
+SUCCESS_SCORE_THRESHOLD = 0.3
+# =============================================================================
+# System Prompt
+# =============================================================================
+SYSTEM_PROMPT = textwrap.dedent("""
+    You are an expert Cybersecurity SOC (Security Operations Center) Analyst AI.
+    You are responding to security incidents on a 500-node enterprise network.
+    Your goal: Investigate alerts, contain all threats, and submit a containment plan — while minimizing business downtime.
+    Available Actions (respond with exactly ONE JSON object per turn):
+    1. Query a host: {"type": "query_host", "hostname": "<HOST>"}
+    2. Isolate a segment (causes downtime): {"type": "isolate_segment", "subnet": "<SUBNET>", "reason": "<WHY>"}
+    3. Block an IOC: {"type": "block_ioc", "ioc_value": "<VALUE>", "ioc_type": "ip|domain|hash"}
+    4. Run forensics: {"type": "run_forensics", "hostname": "<HOST>"}
+    5. Kill a process: {"type": "kill_process", "hostname": "<HOST>", "process_name": "<PROC>"}
+    6. Submit containment plan (ends episode): {"type": "submit_containment_plan", "plan": [{"threat_id": "<ID>", "actions_taken": [...], "root_cause": "<CAUSE>", "confidence": 0.0-1.0}], "executive_summary": "<SUMMARY>"}
+    Rules:
+    - Respond with ONLY a valid JSON object. No markdown, no explanation.
+    - Investigate before acting. Query hosts and run forensics to gather evidence.
+    - Block IOCs (IPs, domains, hashes) found in alerts and forensics.
+    - Kill malicious processes found via forensics.
+    - Avoid unnecessary subnet isolation — it increases business impact.
+    - Submit the containment plan once you've contained all threats.
+    - You have a limited number of steps. Be efficient.
+""").strip()
+# =============================================================================
+# Logging Helpers (EXACT hackathon format — lowercase booleans, null errors)
+# =============================================================================
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
+        flush=True,
+    )
+# =============================================================================
+# Observation Formatting for LLM
+# =============================================================================
+def format_observation(obs: SOCObservation) -> str:
+    """Format observation into readable text for the LLM."""
+    parts = []
+    # Alert queue
+    if obs.alert_queue:
+        parts.append(f"## Active Alerts ({len(obs.alert_queue)}):")
+        for a in obs.alert_queue:
+            parts.append(
+                f"  - [{a.severity.value.upper()}] {a.alert_id} "
+                f"on {a.source_host} ({a.subnet}): {a.description}"
+            )
+            if a.ioc_indicators:
+                parts.append(f"    IOCs: {', '.join(a.ioc_indicators)}")
+    # Network topology
+    topo = obs.network_topology
+    parts.append(f"\n## Network Status:")
+    parts.append(f"  Compromised: {topo.compromised_count} | "
+                 f"Isolated: {topo.isolated_count} | "
+                 f"Online: {topo.online_count}")
+    # Forensics
+    if obs.host_forensics:
+        f = obs.host_forensics
+        parts.append(f"\n## Forensics Result ({f.hostname}):")
+        parts.append(f"  Compromised: {f.is_compromised}")
+        parts.append(f"  Malicious processes: {f.malicious_processes}")
+        parts.append(f"  Suspicious files: {f.suspicious_files}")
+        parts.append(f"  Network connections: {f.network_connections}")
+        parts.append(f"  Memory artifacts: {f.memory_artifacts}")
+    # Active threats
+    parts.append(f"\n## Active Threats: {obs.active_threats if obs.active_threats else 'None (all contained!)'}")
+    parts.append(f"## Business Impact: {obs.business_impact_score:.2f}")
+    parts.append(f"## Step: {obs.step_count} / {obs.max_steps}")
+    # Timeline (last 5)
+    if obs.timeline:
+        parts.append(f"\n## Recent Actions:")
+        for t in obs.timeline[-5:]:
+            parts.append(f"  Step {t.step}: {t.action_type} -> {t.target} (reward={t.reward:.2f})")
+    return "\n".join(parts)
+def parse_llm_action(content: str) -> Dict[str, Any]:
+    """Parse the LLM's response into a valid action dict."""
+    content = content.strip()
+    if content.startswith("```"):
+        lines = content.split("\n")
+        lines = [l for l in lines if not l.strip().startswith("```")]
+        content = "\n".join(lines).strip()
+    try:
+        action = json.loads(content)
+        if isinstance(action, dict) and "type" in action:
+            return action
+    except json.JSONDecodeError:
+        pass
+    # Try to find JSON in the response
+    for start in range(len(content)):
+        if content[start] == "{":
+            for end in range(len(content), start, -1):
+                if content[end - 1] == "}":
+                    try:
+                        action = json.loads(content[start:end])
+                        if isinstance(action, dict) and "type" in action:
+                            return action
+                    except json.JSONDecodeError:
+                        continue
+    raise ValueError(f"Could not parse action from LLM response: {content[:200]}")
+def get_model_action(
+    client: OpenAI,
+    step: int,
+    obs: SOCObservation,
+    task_id: str,
+    history: List[str],
+) -> str:
+    """Get the next action from the LLM."""
+    obs_text = format_observation(obs)
+    if step == 1:
+        user_content = (
+            f"## Incident Briefing (Task: {task_id.upper()})\n\n"
+            f"{obs_text}\n\n"
+            f"Analyze the alerts and begin your investigation. Respond with a single JSON action."
+        )
+    else:
+        user_content = (
+            f"## Observation after your action:\n\n"
+            f"{obs_text}\n\n"
+            f"Continue your investigation. Respond with a single JSON action."
+        )
+    try:
+        completion = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[
+                {"role": "system", "content": SYSTEM_PROMPT},
+                {"role": "user", "content": user_content},
+            ],
+            temperature=TEMPERATURE,
+            max_tokens=MAX_TOKENS,
+            stream=False,
+        )
+        text = (completion.choices[0].message.content or "").strip()
+        return text if text else '{"type": "query_host", "hostname": "WS-001"}'
+    except Exception as exc:
+        if "429" in str(exc) or "RateLimit" in str(exc):
+            raise  # Let the batch runner handle rate limits
+        print(f"[DEBUG] Model request failed: {exc}", flush=True)
+        return '{"type": "query_host", "hostname": "WS-001"}'
+# =============================================================================
+# Episode Runner
+# =============================================================================
+async def run_episode(client: OpenAI, task_id: str) -> tuple:
+    """Run a single episode. Returns (success, steps, score, rewards)."""
+    env = CyberSOCEnvironment()
+    history: List[str] = []
+    rewards: List[float] = []
+    steps_taken = 0
+    score = 0.0
+    success = False
+    log_start(task=task_id, env=BENCHMARK, model=MODEL_NAME)
+    try:
+        # Reset environment
+        obs = env.reset(task_id=task_id)
+        max_steps = MAX_STEPS.get(task_id, 30)
+        for step in range(1, max_steps + 1):
+            if obs.done:
+                break
+            # Get action from LLM
+            llm_response = get_model_action(client, step, obs, task_id, history)
+            # Parse and execute
+            error = None
+            action_str = "unknown"
+            reward = 0.0
+            try:
+                action_dict = parse_llm_action(llm_response)
+                action_str = action_dict.get("type", "unknown")
+                action = SOCActionWrapper(**action_dict)
+                obs = env.step(action)
+                reward = obs.reward or 0.0
+                done = obs.done
+            except Exception as exc:
+                error = str(exc)[:200]
+                done = False
+                reward = 0.0
+            rewards.append(reward)
+            steps_taken = step
+            log_step(step=step, action=action_str, reward=reward, done=done, error=error)
+            history.append(f"Step {step}: {action_str} -> reward {reward:+.2f}")
+            if done:
+                break
+        # Calculate score from final_score if available, else normalize rewards
+        if obs.final_score is not None:
+            score = obs.final_score
+        else:
+            score = sum(rewards) / MAX_POSSIBLE_REWARD if MAX_POSSIBLE_REWARD > 0 else 0.0
+        score = min(max(score, 0.0), 1.0)  # clamp to [0, 1]
+        success = score >= SUCCESS_SCORE_THRESHOLD
+    finally:
+        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+    return success, steps_taken, score, rewards
+# =============================================================================
+# Main
+# =============================================================================
+async def main() -> None:
+    """Run baseline inference across all tasks."""
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    total_scores = {}
+    for task_id in TASKS:
+        success, steps, score, rewards = await run_episode(client, task_id)
+        total_scores[task_id] = score
+    # Print summary
+    avg = sum(total_scores.values()) / len(total_scores) if total_scores else 0.0
+    print(f"\n# Summary: avg_score={avg:.3f}", flush=True)
+    for tid, s in total_scores.items():
+        print(f"#   {tid}: {s:.3f}", flush=True)
+if __name__ == "__main__":
+    asyncio.run(main())

models.py ADDED Viewed

	@@ -0,0 +1,333 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data models for the CyberSOCEnv — Enterprise Cybersecurity Operations Center.
+Defines strict Pydantic models for:
+- Observation: What the agent sees (alerts, forensics, network state, business impact)
+- Action: What the agent can do (discriminated union of 6 action types)
+- Internal state: Deterministic network graph, attack chains, and task tracking
+"""
+from __future__ import annotations
+from enum import Enum
+from typing import Annotated, Any, Dict, List, Literal, Optional, Union
+from openenv.core.env_server.types import Action, Observation, State
+from pydantic import BaseModel, ConfigDict, Field
+# =============================================================================
+# Enums
+# =============================================================================
+class Severity(str, Enum):
+    """SIEM alert severity levels."""
+    LOW = "low"
+    MEDIUM = "medium"
+    HIGH = "high"
+    CRITICAL = "critical"
+class ThreatType(str, Enum):
+    """Classification of threat types in the SOC environment."""
+    RANSOMWARE = "ransomware"
+    PHISHING = "phishing"
+    CREDENTIAL_THEFT = "credential_theft"
+    LATERAL_MOVEMENT = "lateral_movement"
+    C2_COMMUNICATION = "c2_communication"
+    DATA_EXFILTRATION = "data_exfiltration"
+    PRIVILEGE_ESCALATION = "privilege_escalation"
+    MALWARE = "malware"
+    CRYPTOMINING = "cryptomining"
+    SUPPLY_CHAIN = "supply_chain"
+    INSIDER_THREAT = "insider_threat"
+    WEBSHELL = "webshell"
+    BOTNET = "botnet"
+class HostStatus(str, Enum):
+    """Host operational status."""
+    ONLINE = "online"
+    COMPROMISED = "compromised"
+    ISOLATED = "isolated"
+    OFFLINE = "offline"
+class SubnetRole(str, Enum):
+    """Business function of a network subnet."""
+    CORPORATE = "corporate"
+    ENGINEERING = "engineering"
+    FINANCE = "finance"
+    DMZ = "dmz"
+    DATACENTER = "datacenter"
+    EXECUTIVE = "executive"
+# =============================================================================
+# Alert & Network Sub-Models (used in Observation)
+# =============================================================================
+class Alert(BaseModel):
+    """A single SIEM/EDR alert in the queue."""
+    model_config = ConfigDict(extra="forbid")
+    alert_id: str = Field(..., description="Unique alert identifier")
+    timestamp: str = Field(..., description="ISO-8601 timestamp of the alert")
+    source_host: str = Field(..., description="Hostname that generated the alert")
+    severity: Severity = Field(..., description="Alert severity level")
+    threat_type: ThreatType = Field(..., description="Classified threat type")
+    description: str = Field(..., description="Human-readable alert description")
+    ioc_indicators: List[str] = Field(
+        default_factory=list,
+        description="Indicators of compromise (IPs, hashes, domains)",
+    )
+    subnet: str = Field(..., description="Subnet where the alert originated")
+    is_acknowledged: bool = Field(default=False, description="Whether the SOC analyst has acknowledged this alert")
+class HostInfo(BaseModel):
+    """Summary information about a single network host."""
+    model_config = ConfigDict(extra="forbid")
+    hostname: str = Field(..., description="Host FQDN")
+    ip_address: str = Field(..., description="IPv4 address")
+    subnet: str = Field(..., description="Subnet the host belongs to")
+    role: SubnetRole = Field(..., description="Business function")
+    status: HostStatus = Field(default=HostStatus.ONLINE, description="Current status")
+    running_processes: List[str] = Field(default_factory=list, description="Running process names")
+    open_ports: List[int] = Field(default_factory=list, description="Open TCP ports")
+    criticality: float = Field(
+        default=0.5, ge=0.0, le=1.0,
+        description="Business criticality score (0=low, 1=mission-critical)",
+    )
+class NetworkTopology(BaseModel):
+    """Summarized view of the 500-node enterprise network."""
+    model_config = ConfigDict(extra="forbid")
+    total_hosts: int = Field(default=500, description="Total hosts in the network")
+    subnets: Dict[str, int] = Field(
+        default_factory=dict,
+        description="Map of subnet name -> host count",
+    )
+    compromised_count: int = Field(default=0, description="Number of compromised hosts")
+    isolated_count: int = Field(default=0, description="Number of isolated hosts")
+    online_count: int = Field(default=500, description="Number of online hosts")
+class ForensicsResult(BaseModel):
+    """Results from running forensics on a host."""
+    model_config = ConfigDict(extra="forbid")
+    hostname: str = Field(..., description="Analyzed host")
+    malicious_processes: List[str] = Field(default_factory=list, description="Detected malicious processes")
+    suspicious_files: List[str] = Field(default_factory=list, description="Suspicious file paths found")
+    network_connections: List[str] = Field(
+        default_factory=list,
+        description="Suspicious outbound connections (ip:port)",
+    )
+    registry_modifications: List[str] = Field(default_factory=list, description="Modified registry keys")
+    memory_artifacts: List[str] = Field(default_factory=list, description="In-memory IOCs found")
+    is_compromised: bool = Field(default=False, description="Whether forensics confirm compromise")
+class TimelineEntry(BaseModel):
+    """A single entry in the analyst action timeline."""
+    model_config = ConfigDict(extra="forbid")
+    step: int = Field(..., description="Step number when this action was taken")
+    action_type: str = Field(..., description="Type of action taken")
+    target: str = Field(..., description="Target of the action (host, subnet, IOC)")
+    result: str = Field(..., description="Outcome description")
+    reward: float = Field(default=0.0, description="Reward received for this action")
+# =============================================================================
+# Observation
+# =============================================================================
+class SOCObservation(Observation):
+    """What the SOC agent sees at each step.
+    Extends OpenEnv Observation (inherits: done, reward, metadata).
+    """
+    alert_queue: List[Alert] = Field(
+        default_factory=list,
+        description="Current queue of unresolved SIEM/EDR alerts",
+    )
+    network_topology: NetworkTopology = Field(
+        default_factory=NetworkTopology,
+        description="Summary of the enterprise network state",
+    )
+    host_forensics: Optional[ForensicsResult] = Field(
+        default=None,
+        description="Forensics results if RunForensics was the last action, else None",
+    )
+    timeline: List[TimelineEntry] = Field(
+        default_factory=list,
+        description="Chronological log of all actions taken in this episode",
+    )
+    business_impact_score: float = Field(
+        default=0.0, ge=0.0, le=1.0,
+        description="Current business impact (0=no impact, 1=catastrophic outage)",
+    )
+    step_count: int = Field(default=0, ge=0, description="Current step number")
+    active_threats: List[str] = Field(
+        default_factory=list,
+        description="List of threat IDs that are still active/uncontained",
+    )
+    max_steps: int = Field(default=30, description="Maximum steps allowed in this episode")
+    task_id: str = Field(default="easy", description="Current task identifier")
+    total_reward: float = Field(default=0.0, description="Accumulated episode reward")
+    final_score: Optional[float] = Field(
+        default=None,
+        description="Post-episode grader score (0.0-1.0). Only set when done=True and plan submitted.",
+    )
+    grade_breakdown: Optional[Dict[str, Any]] = Field(
+        default=None,
+        description="Detailed grading breakdown. Only set when done=True and plan submitted.",
+    )
+# =============================================================================
+# Actions (Discriminated Union)
+# =============================================================================
+class QueryHost(Action):
+    """Query a specific host for status, processes, and connections."""
+    type: Literal["query_host"] = Field(default="query_host", description="Action discriminator")
+    hostname: str = Field(..., description="Target hostname to query")
+class IsolateSegment(Action):
+    """Isolate an entire network segment from the rest of the network."""
+    type: Literal["isolate_segment"] = Field(default="isolate_segment", description="Action discriminator")
+    subnet: str = Field(..., description="Subnet name to isolate (e.g. 'finance', 'engineering')")
+    reason: str = Field(default="", description="Justification for isolation")
+class BlockIOC(Action):
+    """Block an Indicator of Compromise at the perimeter firewall."""
+    type: Literal["block_ioc"] = Field(default="block_ioc", description="Action discriminator")
+    ioc_value: str = Field(..., description="The IOC to block (IP, domain, or file hash)")
+    ioc_type: Literal["ip", "domain", "hash"] = Field(..., description="Type of IOC")
+class RunForensics(Action):
+    """Run deep forensic analysis on a specific host."""
+    type: Literal["run_forensics"] = Field(default="run_forensics", description="Action discriminator")
+    hostname: str = Field(..., description="Target hostname for forensics")
+class KillProcess(Action):
+    """Terminate a specific process on a host."""
+    type: Literal["kill_process"] = Field(default="kill_process", description="Action discriminator")
+    hostname: str = Field(..., description="Host where the process is running")
+    process_name: str = Field(..., description="Name of the process to terminate")
+class ContainmentEntry(BaseModel):
+    """A single entry in the containment plan."""
+    model_config = ConfigDict(extra="forbid")
+    threat_id: str = Field(..., description="Threat being addressed")
+    actions_taken: List[str] = Field(..., description="List of actions taken to contain this threat")
+    root_cause: str = Field(..., description="Identified root cause")
+    confidence: float = Field(
+        ..., ge=0.0, le=1.0,
+        description="Confidence in the containment (0-1)",
+    )
+class SubmitContainmentPlan(Action):
+    """Submit the final containment plan to end the episode."""
+    type: Literal["submit_containment_plan"] = Field(
+        default="submit_containment_plan", description="Action discriminator"
+    )
+    plan: List[ContainmentEntry] = Field(
+        ..., description="The containment plan addressing all identified threats"
+    )
+    executive_summary: str = Field(
+        ..., description="Brief executive summary for CISO reporting"
+    )
+# Discriminated union of all SOC actions
+SOCAction = Annotated[
+    Union[QueryHost, IsolateSegment, BlockIOC, RunForensics, KillProcess, SubmitContainmentPlan],
+    Field(discriminator="type"),
+]
+# Wrapper model so OpenEnv's create_app can accept it as a single Action class
+class SOCActionWrapper(Action):
+    """Wrapper that deserializes the discriminated union action.
+    OpenEnv's create_app expects a single Action subclass. This wrapper
+    uses a discriminated union field so the HTTP/WS layer can parse
+    any of the 6 action types from a flat JSON payload.
+    Client sends:  {"action": {"type": "query_host", "hostname": "WS-001"}}
+    The wrapper validates -> QueryHost(hostname="WS-001")
+    """
+    type: str = Field(..., description="Action type discriminator")
+    model_config = ConfigDict(extra="allow")  # Allow action-specific fields
+    def to_typed_action(self) -> Union[QueryHost, IsolateSegment, BlockIOC, RunForensics, KillProcess, SubmitContainmentPlan]:
+        """Convert the raw wrapper into the correctly typed action."""
+        data = self.model_dump(exclude={"metadata"})
+        action_map = {
+            "query_host": QueryHost,
+            "isolate_segment": IsolateSegment,
+            "block_ioc": BlockIOC,
+            "run_forensics": RunForensics,
+            "kill_process": KillProcess,
+            "submit_containment_plan": SubmitContainmentPlan,
+        }
+        cls = action_map.get(data["type"])
+        if cls is None:
+            raise ValueError(
+                f"Unknown action type: {data['type']}. "
+                f"Valid types: {list(action_map.keys())}"
+            )
+        return cls(**data)
+# =============================================================================
+# Internal State (not exposed to agent directly)
+# =============================================================================
+class SOCState(State):
+    """Internal environment state tracking the attack simulation.
+    Extends OpenEnv State (inherits: episode_id, step_count).
+    Uses extra='allow' from base State.
+    """
+    task_id: str = Field(default="easy", description="Current task: 'easy', 'medium', or 'hard'")
+    max_steps: int = Field(default=30, description="Maximum steps for this episode")
+    total_reward: float = Field(default=0.0, description="Accumulated reward")
+    business_impact: float = Field(default=0.0, ge=0.0, le=1.0, description="Current business impact score")
+    contained_threats: List[str] = Field(default_factory=list, description="Threat IDs that have been contained")
+    active_threats: List[str] = Field(default_factory=list, description="Currently active threat IDs")
+    blocked_iocs: List[str] = Field(default_factory=list, description="IOCs blocked at perimeter")
+    isolated_subnets: List[str] = Field(default_factory=list, description="Isolated network segments")
+    forensics_run: List[str] = Field(default_factory=list, description="Hosts that had forensics run")
+    killed_processes: List[Dict[str, str]] = Field(default_factory=list, description="Processes killed")
+    queried_hosts: List[str] = Field(default_factory=list, description="Hosts queried")
+    timeline: List[Dict[str, Any]] = Field(default_factory=list, description="Action timeline")
+    is_done: bool = Field(default=False, description="Whether episode has ended")
+    submitted_plan: bool = Field(default=False, description="Whether containment plan was submitted")

openenv.yaml ADDED Viewed

The diff for this file is too large to render. See raw diff

openenv_play.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,9 @@

+Metadata-Version: 2.4
+Name: openenv-play
+Version: 0.1.0
+Summary: Play environment for OpenEnv
+Requires-Python: >=3.10
+Requires-Dist: openenv-core[core]>=0.2.2
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

openenv_play.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+README.md
+pyproject.toml
+./__init__.py
+./client.py
+./models.py
+openenv_play.egg-info/PKG-INFO
+openenv_play.egg-info/SOURCES.txt
+openenv_play.egg-info/dependency_links.txt
+openenv_play.egg-info/entry_points.txt
+openenv_play.egg-info/requires.txt
+openenv_play.egg-info/top_level.txt
+server/__init__.py
+server/app.py
+server/play_environment.py

openenv_play.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

openenv_play.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [console_scripts]
2	+ server = play.server.app:main

openenv_play.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+openenv-core[core]>=0.2.2
+[dev]
+pytest>=8.0.0
+pytest-cov>=4.0.0

openenv_play.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ play

pyproject.toml ADDED Viewed

	@@ -0,0 +1,36 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-cybersocenv"
+version = "0.1.0"
+description = "CyberSOCEnv — Enterprise SOC Incident Response environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
+    "openenv-core[core]>=0.2.2",
+    # Inference dependencies
+    "openai>=1.0.0",
+    "websockets>=12.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+server = "play.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["play", "play.server"]
+package-dir = { "play" = ".", "play.server" = "server" }

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+fastapi
+uvicorn[standard]
+pydantic
+networkx
+websockets
+openai
+tenacity

server/Dockerfile ADDED Viewed

	@@ -0,0 +1,80 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local OpenEnv sources)
+# - Standalone environments (with openenv from PyPI/Git)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Ensure git is available (required for installing dependencies from VCS)
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=play
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv is already vendored in the build context
+# For standalone builds, openenv will be installed via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install dependencies using uv sync
+# If uv.lock exists, use it; otherwise resolve on the fly
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""CyberSOCEnv server components."""
+from .play_environment import CyberSOCEnvironment
+__all__ = ["CyberSOCEnvironment"]

server/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (296 Bytes). View file

server/__pycache__/app.cpython-311.pyc ADDED Viewed

Binary file (2.4 kB). View file

server/__pycache__/graders.cpython-311.pyc ADDED Viewed

Binary file (7.07 kB). View file

server/__pycache__/play_environment.cpython-311.pyc ADDED Viewed

Binary file (25.4 kB). View file

server/__pycache__/task_generator.cpython-311.pyc ADDED Viewed

Binary file (28.9 kB). View file

server/__pycache__/tasks.cpython-311.pyc ADDED Viewed

Binary file (10.8 kB). View file

server/app.py ADDED Viewed

	@@ -0,0 +1,62 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+FastAPI application for the CyberSOCEnv Environment.
+Endpoints:
+    - POST /reset: Reset the environment (pass task_id in body)
+    - POST /step: Execute an action
+    - GET /state: Get current environment state
+    - GET /schema: Get action/observation schemas
+    - WS /ws: WebSocket endpoint for persistent sessions
+Usage:
+    # Development (with auto-reload):
+    uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
+    # Production:
+    uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
+"""
+try:
+    from openenv.core.env_server.http_server import create_app
+except Exception as e:  # pragma: no cover
+    raise ImportError(
+        "openenv is required. Install with: pip install 'openenv-core[core]'"
+    ) from e
+try:
+    from ..models import SOCObservation, SOCActionWrapper
+    from .play_environment import CyberSOCEnvironment
+except (ImportError, ModuleNotFoundError):
+    from models import SOCObservation, SOCActionWrapper
+    from server.play_environment import CyberSOCEnvironment
+# Create the app with the CyberSOCEnv environment
+app = create_app(
+    CyberSOCEnvironment,
+    SOCActionWrapper,
+    SOCObservation,
+    env_name="cybersocenv",
+    max_concurrent_envs=4,
+)
+def main(host: str = "0.0.0.0", port: int = 8000):
+    """Entry point for direct execution.
+    Usage:
+        python -m play.server.app
+        python -m play.server.app --port 8001
+    """
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    main()

server/graders.py ADDED Viewed

	@@ -0,0 +1,212 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Deterministic post-episode graders for CyberSOCEnv.
+Each grader returns a float in [0.0, 1.0] based on how well the agent
+contained the incident. Grading is entirely deterministic — same actions
+always produce the same score.
+Scoring breakdown:
+    - Threat containment (40%):  Did the agent kill all malicious processes?
+    - IOC blocking (20%):        Were critical IOCs blocked at the perimeter?
+    - Forensic coverage (15%):   Were compromised hosts analyzed?
+    - Business impact (15%):     Was unnecessary downtime avoided?
+    - Plan quality (10%):        Did the final plan correctly identify root causes?
+"""
+from __future__ import annotations
+from typing import Any, Dict, List
+def grade_episode(
+    task_id: str,
+    task_def: Dict[str, Any],
+    killed_processes: List[Dict[str, str]],
+    blocked_iocs: List[str],
+    forensics_run: List[str],
+    isolated_subnets: List[str],
+    submitted_plan: bool,
+    plan_entries: List[Dict[str, Any]],
+    final_business_impact: float,
+    step_count: int,
+    total_reward: float,
+) -> float:
+    """Grade an episode deterministically.
+    Args:
+        task_id: The task that was run.
+        task_def: The full task definition from tasks.py.
+        killed_processes: List of {"hostname": ..., "process": ...} killed.
+        blocked_iocs: List of IOC values that were blocked.
+        forensics_run: List of hostnames where forensics were executed.
+        isolated_subnets: List of subnet names that were isolated.
+        submitted_plan: Whether the agent submitted a containment plan.
+        plan_entries: The containment plan entries (list of dicts).
+        final_business_impact: The final business_impact_score at episode end.
+        step_count: Total steps taken.
+        total_reward: Accumulated trajectory reward.
+    Returns:
+        Float in [0.0, 1.0] — the final episode score.
+    """
+    requirements = task_def["containment_requirements"]
+    score = 0.0
+    # ---- 1. Threat Containment (40%) ----
+    must_kill = requirements["must_kill"]
+    if must_kill:
+        kills_matched = 0
+        for req in must_kill:
+            for k in killed_processes:
+                if k.get("hostname") == req["hostname"] and k.get("process") == req["process"]:
+                    kills_matched += 1
+                    break
+        containment_ratio = kills_matched / len(must_kill)
+        score += 0.40 * containment_ratio
+    else:
+        score += 0.40  # No kills required = full marks
+    # ---- 2. IOC Blocking (20%) ----
+    must_block = requirements["must_block_iocs"]
+    if must_block:
+        blocked_matched = sum(1 for ioc in must_block if ioc in blocked_iocs)
+        block_ratio = blocked_matched / len(must_block)
+        score += 0.20 * block_ratio
+    else:
+        score += 0.20
+    # ---- 3. Forensic Coverage (15%) ----
+    must_forensics = requirements["must_forensics"]
+    if must_forensics:
+        forensics_matched = sum(1 for h in must_forensics if h in forensics_run)
+        forensics_ratio = forensics_matched / len(must_forensics)
+        score += 0.15 * forensics_ratio
+    else:
+        score += 0.15
+    # ---- 4. Business Impact / Downtime (15%) ----
+    must_not_isolate = requirements.get("must_not_isolate", [])
+    unnecessary_isolations = sum(1 for s in isolated_subnets if s in must_not_isolate)
+    # Penalty for unnecessary isolations (each costs 5% of this category)
+    isolation_penalty = min(1.0, unnecessary_isolations * 0.33)
+    # Penalty for high business impact
+    impact_penalty = final_business_impact  # 0.0 = perfect, 1.0 = catastrophic
+    downtime_score = max(0.0, 1.0 - isolation_penalty - impact_penalty * 0.5)
+    score += 0.15 * downtime_score
+    # ---- 5. Plan Quality (10%) ----
+    if submitted_plan and plan_entries:
+        # Check if plan addresses all attack chain threats
+        attack_threats = {t["threat_id"] for t in task_def["attack_chain"]}
+        plan_threats = {e.get("threat_id", "") for e in plan_entries}
+        threats_addressed = len(attack_threats & plan_threats)
+        if attack_threats:
+            plan_coverage = threats_addressed / len(attack_threats)
+        else:
+            plan_coverage = 1.0
+        # Average confidence of plan entries
+        confidences = [e.get("confidence", 0.0) for e in plan_entries]
+        avg_confidence = sum(confidences) / len(confidences) if confidences else 0.0
+        plan_score = 0.6 * plan_coverage + 0.4 * avg_confidence
+        score += 0.10 * plan_score
+    elif submitted_plan:
+        score += 0.02  # Submitted but empty plan
+    # else: no plan submitted = 0 for this category
+    # Clamp to [0.0, 1.0]
+    return round(max(0.0, min(1.0, score)), 4)
+def grade_easy(
+    killed_processes: List[Dict[str, str]],
+    blocked_iocs: List[str],
+    forensics_run: List[str],
+    isolated_subnets: List[str],
+    submitted_plan: bool,
+    plan_entries: List[Dict[str, Any]],
+    final_business_impact: float,
+    step_count: int,
+    total_reward: float,
+    task_def: Dict[str, Any],
+) -> float:
+    """Grade the easy task."""
+    return grade_episode(
+        task_id="easy",
+        task_def=task_def,
+        killed_processes=killed_processes,
+        blocked_iocs=blocked_iocs,
+        forensics_run=forensics_run,
+        isolated_subnets=isolated_subnets,
+        submitted_plan=submitted_plan,
+        plan_entries=plan_entries,
+        final_business_impact=final_business_impact,
+        step_count=step_count,
+        total_reward=total_reward,
+    )
+def grade_medium(
+    killed_processes: List[Dict[str, str]],
+    blocked_iocs: List[str],
+    forensics_run: List[str],
+    isolated_subnets: List[str],
+    submitted_plan: bool,
+    plan_entries: List[Dict[str, Any]],
+    final_business_impact: float,
+    step_count: int,
+    total_reward: float,
+    task_def: Dict[str, Any],
+) -> float:
+    """Grade the medium task."""
+    return grade_episode(
+        task_id="medium",
+        task_def=task_def,
+        killed_processes=killed_processes,
+        blocked_iocs=blocked_iocs,
+        forensics_run=forensics_run,
+        isolated_subnets=isolated_subnets,
+        submitted_plan=submitted_plan,
+        plan_entries=plan_entries,
+        final_business_impact=final_business_impact,
+        step_count=step_count,
+        total_reward=total_reward,
+    )
+def grade_hard(
+    killed_processes: List[Dict[str, str]],
+    blocked_iocs: List[str],
+    forensics_run: List[str],
+    isolated_subnets: List[str],
+    submitted_plan: bool,
+    plan_entries: List[Dict[str, Any]],
+    final_business_impact: float,
+    step_count: int,
+    total_reward: float,
+    task_def: Dict[str, Any],
+) -> float:
+    """Grade the hard task."""
+    return grade_episode(
+        task_id="hard",
+        task_def=task_def,
+        killed_processes=killed_processes,
+        blocked_iocs=blocked_iocs,
+        forensics_run=forensics_run,
+        isolated_subnets=isolated_subnets,
+        submitted_plan=submitted_plan,
+        plan_entries=plan_entries,
+        final_business_impact=final_business_impact,
+        step_count=step_count,
+        total_reward=total_reward,
+    )

server/play_environment.py ADDED Viewed

	@@ -0,0 +1,594 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+CyberSOCEnv — Enterprise Cybersecurity Operations Center Environment.
+Implements the OpenEnv Environment interface for a deterministic SOC
+incident response simulation on a 500-node enterprise network.
+The agent receives SIEM/EDR alerts, queries hosts, runs forensics,
+isolates segments, blocks IOCs, kills processes, and submits a
+containment plan — all while minimizing business downtime.
+"""
+from __future__ import annotations
+import copy
+from typing import Any, Dict, List, Optional
+from uuid import uuid4
+from openenv.core.env_server.interfaces import Environment
+from openenv.core.env_server.types import State
+try:
+    from ..models import (
+        SOCObservation,
+        SOCActionWrapper,
+        SOCState,
+        Alert,
+        NetworkTopology,
+        ForensicsResult,
+        TimelineEntry,
+        QueryHost,
+        IsolateSegment,
+        BlockIOC,
+        RunForensics,
+        KillProcess,
+        SubmitContainmentPlan,
+    )
+except ImportError:
+    from models import (
+        SOCObservation,
+        SOCActionWrapper,
+        SOCState,
+        Alert,
+        NetworkTopology,
+        ForensicsResult,
+        TimelineEntry,
+        QueryHost,
+        IsolateSegment,
+        BlockIOC,
+        RunForensics,
+        KillProcess,
+        SubmitContainmentPlan,
+    )
+from .tasks import get_task, build_network
+from .graders import grade_episode
+class CyberSOCEnvironment(Environment):
+    """
+    Deterministic SOC incident response environment.
+    Simulates a 500-node enterprise network under attack. The agent must
+    investigate alerts, contain threats, and submit a containment plan
+    while minimizing business downtime.
+    Supports concurrent WebSocket sessions (each gets own instance).
+    Example:
+        >>> env = CyberSOCEnvironment()
+        >>> obs = env.reset(task_id="easy")
+        >>> print(len(obs.alert_queue))  # Initial alerts
+        >>> obs = env.step(SOCActionWrapper(type="query_host", hostname="WS-042"))
+    """
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+    def __init__(self):
+        """Initialize the environment (actual state set in reset)."""
+        super().__init__()
+        self._state = SOCState(episode_id=str(uuid4()), step_count=0)
+        self._network: Dict[str, List[Dict[str, Any]]] = {}
+        self._task_def: Dict[str, Any] = {}
+        self._alert_queue: List[Dict[str, Any]] = []
+        self._host_index: Dict[str, Dict[str, Any]] = {}  # hostname -> host dict
+        self._plan_entries: List[Dict[str, Any]] = []
+        self._last_forensics: Optional[ForensicsResult] = None
+    # ===========================================================================
+    # reset()
+    # ===========================================================================
+    def reset(
+        self,
+        seed: Optional[int] = None,
+        episode_id: Optional[str] = None,
+        **kwargs: Any,
+    ) -> SOCObservation:
+        """Reset the environment for a specific task.
+        Args:
+            seed: Ignored (environment is fully deterministic).
+            episode_id: Optional custom episode ID.
+            **kwargs: Must include task_id ('easy', 'medium', or 'hard').
+        Returns:
+            Initial SOCObservation with alert queue and network state.
+        """
+        task_id = kwargs.get("task_id", "easy")
+        self._task_def = get_task(task_id)
+        # Build deterministic network
+        self._network = build_network()
+        # Build hostname index for O(1) lookups
+        self._host_index = {}
+        for subnet_name, hosts in self._network.items():
+            for host in hosts:
+                self._host_index[host["hostname"]] = host
+        # Inject attack chain: mark compromised hosts, add malicious processes
+        for threat in self._task_def["attack_chain"]:
+            for hostname in threat["compromised_hosts"]:
+                if hostname in self._host_index:
+                    host = self._host_index[hostname]
+                    host["status"] = "compromised"
+                    for proc in threat["malicious_processes"]:
+                        if proc not in host["running_processes"]:
+                            host["running_processes"].append(proc)
+        # Initialize alert queue (deep copy so mutations don't affect task def)
+        self._alert_queue = copy.deepcopy(self._task_def["initial_alerts"])
+        # Reset state
+        eid = episode_id or str(uuid4())
+        self._state = SOCState(
+            episode_id=eid,
+            step_count=0,
+            task_id=task_id,
+            max_steps=self._task_def["max_steps"],
+            total_reward=0.0,
+            business_impact=self._task_def["initial_business_impact"],
+            contained_threats=[],
+            active_threats=[t["threat_id"] for t in self._task_def["attack_chain"]],
+            blocked_iocs=[],
+            isolated_subnets=[],
+            forensics_run=[],
+            killed_processes=[],
+            queried_hosts=[],
+            timeline=[],
+            is_done=False,
+            submitted_plan=False,
+        )
+        self._plan_entries = []
+        self._last_forensics = None
+        self._reset_rubric()
+        return self._build_observation(reward=0.0, done=False)
+    # ===========================================================================
+    # step()
+    # ===========================================================================
+    def step(
+        self,
+        action: SOCActionWrapper,  # type: ignore[override]
+        timeout_s: Optional[float] = None,
+        **kwargs: Any,
+    ) -> SOCObservation:
+        """Process one agent action.
+        Args:
+            action: SOCActionWrapper containing the typed action.
+            timeout_s: Ignored.
+        Returns:
+            SOCObservation with updated state, reward, and done flag.
+        """
+        if self._state.is_done:
+            return self._build_observation(reward=0.0, done=True)
+        # Increment step
+        self._state.step_count += 1
+        # Convert wrapper to typed action
+        typed_action = action.to_typed_action()
+        # Dispatch to handler
+        reward = 0.0
+        result_description = "unknown action"
+        if isinstance(typed_action, QueryHost):
+            reward, result_description = self._handle_query_host(typed_action)
+        elif isinstance(typed_action, IsolateSegment):
+            reward, result_description = self._handle_isolate_segment(typed_action)
+        elif isinstance(typed_action, BlockIOC):
+            reward, result_description = self._handle_block_ioc(typed_action)
+        elif isinstance(typed_action, RunForensics):
+            reward, result_description = self._handle_run_forensics(typed_action)
+        elif isinstance(typed_action, KillProcess):
+            reward, result_description = self._handle_kill_process(typed_action)
+        elif isinstance(typed_action, SubmitContainmentPlan):
+            reward, result_description = self._handle_submit_plan(typed_action)
+        # Business impact grows each step (attacker progresses)
+        if not self._state.is_done:
+            impact_rate = self._task_def.get("impact_per_step", 0.02)
+            # Reduce impact growth if threats are being contained
+            active_ratio = len(self._state.active_threats) / max(1, len(self._task_def["attack_chain"]))
+            self._state.business_impact = min(
+                1.0,
+                self._state.business_impact + impact_rate * active_ratio,
+            )
+        # Record timeline
+        self._state.timeline.append({
+            "step": self._state.step_count,
+            "action_type": typed_action.type,
+            "target": self._get_action_target(typed_action),
+            "result": result_description,
+            "reward": reward,
+        })
+        # Accumulate reward
+        self._state.total_reward += reward
+        # Check termination
+        done = False
+        if self._state.submitted_plan:
+            done = True
+            self._state.is_done = True
+        elif self._state.step_count >= self._state.max_steps:
+            done = True
+            self._state.is_done = True
+            reward -= 0.20  # Penalty for running out of time
+            self._state.total_reward += (-0.20)
+        return self._build_observation(reward=reward, done=done)
+    # ===========================================================================
+    # Action Handlers (return (reward, description))
+    # ===========================================================================
+    def _handle_query_host(self, action: QueryHost) -> tuple[float, str]:
+        """Query a host for status info."""
+        hostname = action.hostname
+        self._last_forensics = None  # Clear forensics from previous step
+        if hostname not in self._host_index:
+            return -0.05, f"Host '{hostname}' not found in network"
+        host = self._host_index[hostname]
+        # Reward for querying compromised hosts (useful investigation)
+        reward = 0.0
+        if host["status"] == "compromised" and hostname not in self._state.queried_hosts:
+            reward = 0.05  # Good: investigating a compromised host
+        elif hostname in self._state.queried_hosts:
+            reward = -0.02  # Penalty: re-querying same host wastes time
+        self._state.queried_hosts.append(hostname)
+        return reward, f"Queried {hostname}: status={host['status']}, procs={len(host['running_processes'])}"
+    def _handle_isolate_segment(self, action: IsolateSegment) -> tuple[float, str]:
+        """Isolate a network segment."""
+        subnet = action.subnet
+        self._last_forensics = None
+        if subnet not in self._network:
+            return -0.05, f"Subnet '{subnet}' does not exist"
+        if subnet in self._state.isolated_subnets:
+            return -0.02, f"Subnet '{subnet}' is already isolated"
+        # Isolate all hosts in the subnet
+        for host in self._network[subnet]:
+            host["status"] = "isolated"
+        self._state.isolated_subnets.append(subnet)
+        # Check if this contains any active threats
+        reward = 0.0
+        threats_contained = []
+        for threat in self._task_def["attack_chain"]:
+            if threat["threat_id"] in self._state.active_threats:
+                # Check if any compromised hosts are in this subnet
+                for ch in threat["compromised_hosts"]:
+                    if ch in self._host_index and self._host_index[ch]["subnet"] == subnet:
+                        threats_contained.append(threat["threat_id"])
+                        break
+        if threats_contained:
+            reward = 0.15 * len(threats_contained)  # Good: containing lateral movement
+            for tid in threats_contained:
+                if tid not in self._state.contained_threats:
+                    self._state.contained_threats.append(tid)
+                if tid in self._state.active_threats:
+                    self._state.active_threats.remove(tid)
+        # Check if this is an unnecessary isolation (business downtime)
+        must_not_isolate = self._task_def["containment_requirements"].get("must_not_isolate", [])
+        if subnet in must_not_isolate:
+            reward -= 0.10  # Penalty: unnecessary downtime
+            self._state.business_impact = min(1.0, self._state.business_impact + 0.08)
+        return reward, f"Isolated subnet '{subnet}'. Threats contained: {threats_contained}"
+    def _handle_block_ioc(self, action: BlockIOC) -> tuple[float, str]:
+        """Block an IOC at the perimeter."""
+        ioc = action.ioc_value
+        self._last_forensics = None
+        if ioc in self._state.blocked_iocs:
+            return -0.02, f"IOC '{ioc}' is already blocked"
+        self._state.blocked_iocs.append(ioc)
+        # Check if this IOC is relevant to any active threat
+        reward = 0.0
+        relevant = False
+        for threat in self._task_def["attack_chain"]:
+            all_iocs = (
+                threat["iocs"].get("hashes", [])
+                + threat["iocs"].get("ips", [])
+                + threat["iocs"].get("domains", [])
+            )
+            if ioc in all_iocs:
+                relevant = True
+                # Extra reward for blocking C2 server IPs
+                if ioc in threat.get("c2_servers", []):
+                    reward += 0.15  # High value: cutting C2
+                else:
+                    reward += 0.10  # Good: blocking relevant IOC
+                break
+        if not relevant:
+            reward = -0.03  # Noise: blocking irrelevant IOC
+        return reward, f"Blocked IOC '{ioc}' (type={action.ioc_type}). Relevant: {relevant}"
+    def _handle_run_forensics(self, action: RunForensics) -> tuple[float, str]:
+        """Run forensic analysis on a host."""
+        hostname = action.hostname
+        if hostname not in self._host_index:
+            self._last_forensics = None
+            return -0.05, f"Host '{hostname}' not found"
+        host = self._host_index[hostname]
+        # Build forensics result based on actual host state
+        is_compromised = host["status"] == "compromised"
+        malicious_procs = []
+        suspicious_files = []
+        network_conns = []
+        registry_mods = []
+        memory_artifacts = []
+        if is_compromised:
+            # Find which threat(s) affect this host
+            for threat in self._task_def["attack_chain"]:
+                if hostname in threat["compromised_hosts"]:
+                    malicious_procs.extend(threat["malicious_processes"])
+                    # Generate deterministic forensic artifacts
+                    for proc in threat["malicious_processes"]:
+                        suspicious_files.append(f"C:\\Windows\\Temp\\{proc}.dat")
+                        registry_mods.append(f"HKLM\\Software\\Microsoft\\Windows\\CurrentVersion\\Run\\{proc}")
+                    for c2 in threat.get("c2_servers", []):
+                        network_conns.append(f"{c2}:443")
+                    for ioc_hash in threat["iocs"].get("hashes", []):
+                        memory_artifacts.append(f"memory_inject_{ioc_hash[:8]}")
+        self._last_forensics = ForensicsResult(
+            hostname=hostname,
+            malicious_processes=malicious_procs,
+            suspicious_files=suspicious_files,
+            network_connections=network_conns,
+            registry_modifications=registry_mods,
+            memory_artifacts=memory_artifacts,
+            is_compromised=is_compromised,
+        )
+        # Reward
+        reward = 0.0
+        if hostname not in self._state.forensics_run:
+            if is_compromised:
+                reward = 0.10  # Good: found evidence
+            else:
+                reward = 0.02  # Cleared a host (some value)
+            self._state.forensics_run.append(hostname)
+        else:
+            reward = -0.02  # Re-running forensics wastes time
+        return reward, f"Forensics on {hostname}: compromised={is_compromised}, procs={malicious_procs}"
+    def _handle_kill_process(self, action: KillProcess) -> tuple[float, str]:
+        """Kill a process on a host."""
+        hostname = action.hostname
+        process = action.process_name
+        self._last_forensics = None
+        if hostname not in self._host_index:
+            return -0.05, f"Host '{hostname}' not found"
+        host = self._host_index[hostname]
+        if host["status"] == "isolated":
+            return -0.02, f"Host '{hostname}' is isolated — cannot interact"
+        if process not in host["running_processes"]:
+            return -0.03, f"Process '{process}' not running on {hostname}"
+        # Kill the process
+        host["running_processes"].remove(process)
+        self._state.killed_processes.append({"hostname": hostname, "process": process})
+        # Check if this was a malicious process
+        reward = 0.0
+        was_malicious = False
+        for threat in self._task_def["attack_chain"]:
+            if hostname in threat["compromised_hosts"] and process in threat["malicious_processes"]:
+                was_malicious = True
+                reward = 0.15  # Major reward: stopping malicious activity
+                # Check if all processes for this threat are killed
+                all_killed = True
+                for th_host in threat["compromised_hosts"]:
+                    for th_proc in threat["malicious_processes"]:
+                        still_running = (
+                            th_host in self._host_index
+                            and th_proc in self._host_index[th_host]["running_processes"]
+                        )
+                        if still_running:
+                            all_killed = False
+                            break
+                if all_killed and threat["threat_id"] in self._state.active_threats:
+                    self._state.active_threats.remove(threat["threat_id"])
+                    if threat["threat_id"] not in self._state.contained_threats:
+                        self._state.contained_threats.append(threat["threat_id"])
+                    reward += 0.10  # Bonus: fully contained a threat
+                break
+        if not was_malicious:
+            reward = -0.08  # Penalty: killing legitimate process = downtime
+            self._state.business_impact = min(1.0, self._state.business_impact + 0.03)
+        return reward, f"Killed '{process}' on {hostname}. Malicious: {was_malicious}"
+    def _handle_submit_plan(self, action: SubmitContainmentPlan) -> tuple[float, str]:
+        """Submit the final containment plan."""
+        self._last_forensics = None
+        self._state.submitted_plan = True
+        self._plan_entries = [entry.model_dump() for entry in action.plan]
+        # Grade the episode
+        final_score = grade_episode(
+            task_id=self._state.task_id,
+            task_def=self._task_def,
+            killed_processes=self._state.killed_processes,
+            blocked_iocs=self._state.blocked_iocs,
+            forensics_run=self._state.forensics_run,
+            isolated_subnets=self._state.isolated_subnets,
+            submitted_plan=True,
+            plan_entries=self._plan_entries,
+            final_business_impact=self._state.business_impact,
+            step_count=self._state.step_count,
+            total_reward=self._state.total_reward,
+        )
+        # Reward proportional to final grade
+        reward = final_score * 1.0  # Scale: perfect score = 1.0 reward
+        description = (
+            f"Containment plan submitted. "
+            f"Grade: {final_score:.3f}. "
+            f"Threats contained: {len(self._state.contained_threats)}/{len(self._task_def['attack_chain'])}. "
+            f"Business impact: {self._state.business_impact:.2f}"
+        )
+        return reward, description
+    # ===========================================================================
+    # Helpers
+    # ===========================================================================
+    def _build_observation(self, reward: float, done: bool) -> SOCObservation:
+        """Build the observation from current state."""
+        # Compute network topology summary
+        subnet_counts = {name: len(hosts) for name, hosts in self._network.items()}
+        compromised = sum(
+            1 for hosts in self._network.values()
+            for h in hosts if h["status"] == "compromised"
+        )
+        isolated = sum(
+            1 for hosts in self._network.values()
+            for h in hosts if h["status"] == "isolated"
+        )
+        total = sum(len(hosts) for hosts in self._network.values())
+        topology = NetworkTopology(
+            total_hosts=total,
+            subnets=subnet_counts,
+            compromised_count=compromised,
+            isolated_count=isolated,
+            online_count=total - compromised - isolated,
+        )
+        # Build alert list
+        alerts = [Alert(**a) for a in self._alert_queue]
+        # Build timeline
+        timeline = [
+            TimelineEntry(
+                step=t["step"],
+                action_type=t["action_type"],
+                target=t["target"],
+                result=t["result"],
+                reward=t["reward"],
+            )
+            for t in self._state.timeline
+        ]
+        # Compute final grade if done
+        final_score_val = None
+        grade_breakdown_val = None
+        if done and self._state.submitted_plan:
+            computed_score = grade_episode(
+                task_id=self._state.task_id,
+                task_def=self._task_def,
+                killed_processes=self._state.killed_processes,
+                blocked_iocs=self._state.blocked_iocs,
+                forensics_run=self._state.forensics_run,
+                isolated_subnets=self._state.isolated_subnets,
+                submitted_plan=self._state.submitted_plan,
+                plan_entries=self._plan_entries,
+                final_business_impact=self._state.business_impact,
+                step_count=self._state.step_count,
+                total_reward=self._state.total_reward,
+            )
+            final_score_val = round(computed_score, 4)
+            grade_breakdown_val = {
+                "threats_contained": len(self._state.contained_threats),
+                "total_threats": len(self._task_def["attack_chain"]),
+                "iocs_blocked": len(self._state.blocked_iocs),
+                "hosts_forensics": len(self._state.forensics_run),
+                "subnets_isolated": len(self._state.isolated_subnets),
+                "business_impact": round(self._state.business_impact, 4),
+            }
+        return SOCObservation(
+            alert_queue=alerts,
+            network_topology=topology,
+            host_forensics=self._last_forensics,
+            timeline=timeline,
+            business_impact_score=round(self._state.business_impact, 4),
+            step_count=self._state.step_count,
+            active_threats=list(self._state.active_threats),
+            max_steps=self._state.max_steps,
+            task_id=self._state.task_id,
+            total_reward=round(self._state.total_reward, 4),
+            final_score=final_score_val,
+            grade_breakdown=grade_breakdown_val,
+            done=done,
+            reward=round(reward, 4),
+        )
+    def _get_action_target(self, action: Any) -> str:
+        """Extract the target string from a typed action for timeline logging."""
+        if isinstance(action, QueryHost):
+            return action.hostname
+        elif isinstance(action, IsolateSegment):
+            return action.subnet
+        elif isinstance(action, BlockIOC):
+            return f"{action.ioc_type}:{action.ioc_value}"
+        elif isinstance(action, RunForensics):
+            return action.hostname
+        elif isinstance(action, KillProcess):
+            return f"{action.hostname}/{action.process_name}"
+        elif isinstance(action, SubmitContainmentPlan):
+            return f"{len(action.plan)} entries"
+        return "unknown"
+    @property
+    def state(self) -> SOCState:
+        """Get the current internal environment state."""
+        return self._state

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+openenv-core[core]>=0.2.2
+openai>=1.0.0
+websockets>=12.0

server/task_generator.py ADDED Viewed

	@@ -0,0 +1,627 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Procedural Task Generator for CyberSOCEnv.
+Generates 1000+ unique, deterministic attack scenarios from a task_id seed.
+Each task_id (e.g. 'gen_0001') always produces the exact same scenario.
+Design:
+    - hash(task_id) -> deterministic seed -> random.Random instance
+    - Seed drives ALL choices: attack type, hosts, processes, IOCs, alerts
+    - 12 attack categories, 50+ malware names, 40+ C2 domains
+    - 3 difficulty tiers based on task number
+No actual randomness — reproducible across runs and platforms.
+"""
+from __future__ import annotations
+import hashlib
+import random
+from typing import Any, Dict, List, Tuple
+# =============================================================================
+# Template Pools (the "vocabulary" of the generator)
+# =============================================================================
+# --- Malware process names by category ---
+MALWARE_PROCESSES = {
+    "ransomware": [
+        "cryptolocker.exe", "wannacry.exe", "blackcat_ransom.exe",
+        "lockbit3.exe", "revil_encrypt.exe", "hive_locker.exe",
+        "conti_crypt.exe", "ryuk_payload.exe", "maze_encrypt.exe",
+        "darkside_enc.exe", "babuk_lock.exe", "avaddon_crypt.exe",
+    ],
+    "phishing": [
+        "outlook_macro.exe", "word_dropper.exe", "macro_loader.exe",
+        "vba_agent.exe", "pdf_exploit.exe", "html_smuggler.exe",
+        "iso_mounter.exe", "lnk_runner.exe",
+    ],
+    "credential_theft": [
+        "mimikatz.exe", "lazagne.exe", "hashdump.exe",
+        "procdump_lsass.exe", "rubeus.exe", "kerbrute.exe",
+        "sharphound.exe", "bloodhound_collect.exe",
+    ],
+    "lateral_movement": [
+        "svchost_backdoor.exe", "psexec_svc.exe", "wmic_lateral.exe",
+        "rdp_hijack.exe", "ssh_brute.exe", "evil_winrm.exe",
+        "dcom_exec.exe", "smb_relay.exe",
+    ],
+    "c2_communication": [
+        "svchost_c2.exe", "cobalt_beacon.exe", "sliver_implant.exe",
+        "meterpreter.exe", "covenant_grunt.exe", "mythic_agent.exe",
+        "dns_tunnel.exe", "icmp_beacon.exe",
+    ],
+    "privilege_escalation": [
+        "exploit_kernel.exe", "potato_exploit.exe", "uac_bypass.exe",
+        "printspoofer.exe", "juicy_potato.exe", "named_pipe_exploit.exe",
+        "token_impersonate.exe", "dll_hijack.exe",
+    ],
+    "data_exfiltration": [
+        "data_pump.exe", "rclone_sync.exe", "mega_upload.exe",
+        "ftp_exfil.exe", "dns_exfil.exe", "cloud_sync_mal.exe",
+        "archive_send.exe", "stealer_agent.exe",
+    ],
+    "cryptomining": [
+        "xmrig_miner.exe", "ethminer.exe", "cpuminer.exe",
+        "nicehash_mal.exe", "coinhive_svc.exe", "monero_mine.exe",
+    ],
+    "supply_chain": [
+        "update_agent_mal.exe", "npm_backdoor.exe", "pip_trojan.exe",
+        "vscode_ext_mal.exe", "docker_implant.exe", "nuget_poison.exe",
+    ],
+    "insider_threat": [
+        "usb_copy.exe", "screen_capture.exe", "keylogger_svc.exe",
+        "email_forward.exe", "cloud_upload.exe", "print_spooler_mal.exe",
+    ],
+    "webshell": [
+        "cmd_webshell.php", "asp_backdoor.exe", "jsp_shell.exe",
+        "python_rshell.exe", "nodejs_shell.exe", "perl_cgi_shell.exe",
+    ],
+    "botnet": [
+        "mirai_bot.exe", "emotet_loader.exe", "trickbot_svc.exe",
+        "qbot_agent.exe", "dridex_dll.exe", "zloader_inject.exe",
+    ],
+}
+# --- C2 domains ---
+C2_DOMAINS = [
+    "cdn-update.malware-c2.net", "api.darkc2.io", "telemetry-svc.ru",
+    "secure-update.evil.net", "cdn.payload-delivery.com", "api.shadownet.io",
+    "sync.cloud-c2.xyz", "update.legit-looking.com", "beacon.covert-ops.net",
+    "dns.tunnel-relay.org", "img.cdn-malware.com", "static.evil-cdn.net",
+    "api.stealthc2.io", "ws.encrypted-relay.net", "feed.darkweb-proxy.com",
+    "auth.phish-server.net", "login.fake-portal.com", "mail.spoof-relay.org",
+    "git.supply-chain.dev", "npm.compromised-pkg.io", "pypi.trojan-lib.org",
+    "dl.ransomware-pay.onion", "tor.exit-node-c2.net", "i2p.covert-chan.net",
+    "iot.botnet-c2.xyz", "cam.mirai-variant.net", "mqtt.iot-exploit.io",
+    "ftp.exfil-server.ru", "sftp.data-steal.com", "mega.cloud-drop.io",
+    "gist.code-exfil.dev", "paste.data-dump.xyz", "bin.steganography.net",
+    "vpn.tunnel-c2.com", "proxy.relay-beacon.org", "socks.covert-proxy.io",
+    "wpad.evil-config.net", "ntp.time-beacon.com", "ldap.ad-exploit.org",
+    "kerberos.ticket-steal.net",
+]
+# --- C2 IPs (RFC 5737 documentation ranges + realistic-looking) ---
+C2_IPS = [
+    "198.51.100.10", "198.51.100.22", "198.51.100.33", "198.51.100.44",
+    "198.51.100.55", "198.51.100.66", "198.51.100.77", "198.51.100.88",
+    "198.51.100.99", "198.51.100.110", "198.51.100.121", "198.51.100.132",
+    "203.0.113.10", "203.0.113.21", "203.0.113.32", "203.0.113.43",
+    "203.0.113.54", "203.0.113.65", "203.0.113.76", "203.0.113.87",
+    "203.0.113.98", "203.0.113.109", "203.0.113.120", "203.0.113.131",
+    "192.0.2.10", "192.0.2.21", "192.0.2.32", "192.0.2.43",
+    "192.0.2.54", "192.0.2.65", "192.0.2.76", "192.0.2.87",
+    "100.64.0.10", "100.64.0.22", "100.64.0.33", "100.64.0.44",
+]
+# --- Subnet definitions (must match build_network() in tasks.py) ---
+SUBNETS = {
+    "corporate":   {"prefix": "WS",   "count": 150, "criticality": 0.3},
+    "engineering": {"prefix": "DEV",   "count": 100, "criticality": 0.5},
+    "finance":     {"prefix": "FIN",   "count": 50,  "criticality": 0.8},
+    "dmz":         {"prefix": "DMZ",   "count": 30,  "criticality": 0.6},
+    "datacenter":  {"prefix": "SRV",   "count": 50,  "criticality": 0.9},
+    "executive":   {"prefix": "EXEC",  "count": 20,  "criticality": 1.0},
+}
+# --- Attack phases in kill-chain order ---
+ATTACK_PHASES = [
+    "initial_access", "execution", "persistence", "privilege_escalation",
+    "credential_access", "lateral_movement", "command_and_control",
+    "exfiltration", "impact",
+]
+# --- Alert description templates ---
+ALERT_TEMPLATES = {
+    "ransomware": [
+        "EDR detected file encryption activity on {host}. Process '{proc}' is encrypting files in user directories.",
+        "Anomalous file system activity: {count} files renamed with .{ext} extension in {secs} seconds on {host}.",
+        "Ransomware signature detected in process '{proc}' on {host}. Volume shadow copies being deleted.",
+    ],
+    "phishing": [
+        "User on {host} clicked suspicious link in email. {proc} execution detected downloading payload from {domain}.",
+        "Macro-enabled document opened on {host}. Outbound connection to {domain} detected.",
+        "Suspicious email attachment executed on {host}. Process '{proc}' spawned child processes.",
+    ],
+    "credential_theft": [
+        "LSASS memory access detected on {host} — possible credential dumping via {proc}.",
+        "Kerberos ticket request anomaly on {host}. Process '{proc}' attempting ticket manipulation.",
+        "SAM database access detected on {host}. Credential harvesting tool '{proc}' identified.",
+    ],
+    "lateral_movement": [
+        "Suspicious RDP login to {host} from compromised source using admin credentials. Process '{proc}' spawned.",
+        "SMB lateral movement detected: '{proc}' deployed on {host} via remote service creation.",
+        "WMI remote execution detected on {host}. Process '{proc}' launched from external host.",
+    ],
+    "c2_communication": [
+        "Periodic beaconing detected from {host} to {ip} every {interval} seconds. Encrypted payload exchange observed.",
+        "DNS tunneling activity from {host}. Suspicious queries to {domain} with encoded payloads.",
+        "Cobalt Strike beacon profile detected on {host}. Process '{proc}' communicating with {ip}.",
+    ],
+    "privilege_escalation": [
+        "Kernel exploit attempt on {host}. Process '{proc}' gained SYSTEM privileges.",
+        "UAC bypass detected on {host}. Process '{proc}' elevated to admin without user consent.",
+        "Token impersonation attack on {host}. Process '{proc}' obtained domain admin token.",
+    ],
+    "data_exfiltration": [
+        "Large data transfer ({size} GB) to external IP {ip} from {host}. Possible exfiltration of {data_type}.",
+        "Staging activity detected on {host}. Process '{proc}' archiving sensitive directories for extraction.",
+        "Cloud storage upload from {host} to unauthorized account. Process '{proc}' transferring {data_type}.",
+    ],
+    "cryptomining": [
+        "High CPU usage (98%) on {host}. Process '{proc}' identified as cryptocurrency miner.",
+        "Mining pool connection from {host} to {ip}:{port}. Process '{proc}' consuming all available cores.",
+        "Stratum protocol detected on {host}. Unauthorized mining process '{proc}' active.",
+    ],
+    "supply_chain": [
+        "Compromised package detected in CI/CD pipeline on {host}. Process '{proc}' executing post-install scripts.",
+        "Backdoored update agent on {host}. Process '{proc}' downloading payloads from {domain}.",
+        "Malicious dependency loaded on {host}. Process '{proc}' establishing covert communication channels.",
+    ],
+    "insider_threat": [
+        "Unusual data access pattern on {host}. Process '{proc}' accessing files outside user's normal scope.",
+        "USB mass storage device connected on {host}. Process '{proc}' copying sensitive files to removable media.",
+        "After-hours bulk file download on {host}. Process '{proc}' archiving {data_type} documents.",
+    ],
+    "webshell": [
+        "Web shell detected on {host}. Process '{proc}' executing system commands via HTTP POST requests.",
+        "Suspicious file upload on {host}. Process '{proc}' created in web-accessible directory with bash capabilities.",
+        "Remote code execution on {host}. Process '{proc}' spawned from web server with SYSTEM context.",
+    ],
+    "botnet": [
+        "Bot agent detected on {host}. Process '{proc}' joining command pool at {ip}.",
+        "DDoS toolkit loaded on {host}. Process '{proc}' ready to receive attack instructions from {domain}.",
+        "Worm propagation from {host}. Process '{proc}' scanning network for vulnerable hosts.",
+    ],
+}
+# --- Severity levels with weights ---
+SEVERITIES = ["low", "medium", "high", "critical"]
+SEVERITY_WEIGHTS = {"easy": [0.1, 0.4, 0.4, 0.1], "medium": [0.0, 0.2, 0.5, 0.3], "hard": [0.0, 0.1, 0.3, 0.6]}
+# --- Data types for exfil descriptions ---
+DATA_TYPES = [
+    "customer PII", "financial records", "employee credentials",
+    "source code", "trade secrets", "medical records",
+    "encryption keys", "database backups", "API tokens",
+    "board meeting minutes", "M&A documents", "patent filings",
+]
+# --- File extensions for ransomware ---
+RANSOM_EXTENSIONS = [
+    "locked", "encrypted", "crypted", "crypt", "enc", "pay",
+    "ransom", "darkside", "blackcat", "hive", "lockbit", "ryuk",
+]
+# =============================================================================
+# Deterministic Seed Helper
+# =============================================================================
+def _seed_from_task_id(task_id: str) -> int:
+    """Create a deterministic integer seed from a task_id string."""
+    h = hashlib.sha256(task_id.encode("utf-8")).hexdigest()
+    return int(h[:16], 16)
+def _make_hash(rng: random.Random) -> str:
+    """Generate a fake MD5-like hash deterministically."""
+    return "".join(rng.choice("0123456789abcdef") for _ in range(32))
+# =============================================================================
+# Difficulty Classification
+# =============================================================================
+def _get_difficulty(task_id: str, rng: random.Random) -> str:
+    """Determine difficulty from task_id pattern or seed."""
+    # If task_id has an explicit difficulty prefix, use it
+    if task_id.startswith("easy_") or task_id.startswith("gen_easy_"):
+        return "easy"
+    if task_id.startswith("medium_") or task_id.startswith("gen_medium_"):
+        return "medium"
+    if task_id.startswith("hard_") or task_id.startswith("gen_hard_"):
+        return "hard"
+    # For gen_NNNN pattern, use number ranges
+    if task_id.startswith("gen_"):
+        try:
+            num = int(task_id.split("_")[1])
+            if num <= 333:
+                return "easy"
+            elif num <= 666:
+                return "medium"
+            else:
+                return "hard"
+        except (ValueError, IndexError):
+            pass
+    # Fallback: use seed-based distribution
+    return rng.choice(["easy", "medium", "hard"])
+# =============================================================================
+# Core Generator
+# =============================================================================
+def _pick_hosts(rng: random.Random, subnet: str, count: int) -> List[str]:
+    """Pick `count` unique host names from a subnet."""
+    info = SUBNETS[subnet]
+    prefix = info["prefix"]
+    max_idx = info["count"]
+    indices = rng.sample(range(1, max_idx + 1), min(count, max_idx))
+    return [f"{prefix}-{idx:03d}" for idx in indices]
+def _pick_subnets(rng: random.Random, count: int) -> List[str]:
+    """Pick `count` unique subnet names."""
+    all_subnets = list(SUBNETS.keys())
+    return rng.sample(all_subnets, min(count, len(all_subnets)))
+def _generate_threat(
+    rng: random.Random,
+    threat_id: str,
+    attack_type: str,
+    phase: str,
+    available_subnets: List[str],
+    used_hosts: set,
+) -> Tuple[Dict[str, Any], List[str]]:
+    """Generate a single threat in the attack chain.
+    Returns:
+        (threat_dict, list_of_compromised_hosts)
+    """
+    # Pick target subnet and hosts
+    subnet = rng.choice(available_subnets)
+    num_hosts = rng.randint(1, 3) if attack_type != "ransomware" else rng.randint(1, 2)
+    hosts = _pick_hosts(rng, subnet, num_hosts + 3)  # Pick extra to avoid collisions
+    hosts = [h for h in hosts if h not in used_hosts][:num_hosts]
+    if not hosts:
+        # Fallback: pick from any subnet
+        fallback_subnet = rng.choice(list(SUBNETS.keys()))
+        hosts = _pick_hosts(rng, fallback_subnet, num_hosts + 5)
+        hosts = [h for h in hosts if h not in used_hosts][:max(1, num_hosts)]
+    # Pick malware process
+    procs = MALWARE_PROCESSES.get(attack_type, MALWARE_PROCESSES["lateral_movement"])
+    proc = rng.choice(procs)
+    # Generate IOCs
+    num_hashes = rng.randint(1, 2)
+    hashes = [_make_hash(rng) for _ in range(num_hashes)]
+    num_ips = rng.randint(0, 2) if attack_type in ("c2_communication", "data_exfiltration", "cryptomining", "botnet") else rng.randint(0, 1)
+    ips = rng.sample(C2_IPS, min(num_ips, len(C2_IPS))) if num_ips > 0 else []
+    num_domains = rng.randint(0, 2) if attack_type in ("c2_communication", "phishing", "supply_chain", "botnet") else rng.randint(0, 1)
+    domains = rng.sample(C2_DOMAINS, min(num_domains, len(C2_DOMAINS))) if num_domains > 0 else []
+    # C2 servers (subset of IPs for c2/exfil types)
+    c2_servers = ips[:1] if attack_type in ("c2_communication", "data_exfiltration", "botnet") else []
+    # Lateral targets (for movement-type threats)
+    lateral_targets: List[str] = []
+    if attack_type in ("lateral_movement", "credential_theft", "c2_communication"):
+        lat_subnet = rng.choice(list(SUBNETS.keys()))
+        lat_hosts = _pick_hosts(rng, lat_subnet, 2)
+        lateral_targets = [h for h in lat_hosts if h not in used_hosts and h not in hosts][:rng.randint(0, 2)]
+    # Exfil targets
+    exfil_targets: List[str] = []
+    if attack_type == "data_exfiltration":
+        exfil_targets = list(hosts)
+    threat = {
+        "threat_id": threat_id,
+        "threat_type": attack_type,
+        "phase": phase,
+        "compromised_hosts": hosts,
+        "malicious_processes": [proc],
+        "c2_servers": c2_servers,
+        "iocs": {
+            "hashes": hashes,
+            "ips": ips,
+            "domains": domains,
+        },
+        "lateral_targets": lateral_targets,
+        "exfil_targets": exfil_targets,
+    }
+    return threat, hosts
+def _generate_alert(
+    rng: random.Random,
+    alert_idx: int,
+    task_prefix: str,
+    threat: Dict[str, Any],
+    timestamp_base: int,
+) -> Dict[str, Any]:
+    """Generate a single SIEM alert for a threat."""
+    attack_type = threat["threat_type"]
+    host = rng.choice(threat["compromised_hosts"])
+    proc = threat["malicious_processes"][0]
+    # Pick template
+    templates = ALERT_TEMPLATES.get(attack_type, ALERT_TEMPLATES["lateral_movement"])
+    template = rng.choice(templates)
+    # Fill template
+    description = template.format(
+        host=host,
+        proc=proc,
+        domain=rng.choice(threat["iocs"]["domains"]) if threat["iocs"]["domains"] else "unknown.example.com",
+        ip=rng.choice(threat["iocs"]["ips"]) if threat["iocs"]["ips"] else "0.0.0.0",
+        count=rng.randint(50, 500),
+        ext=rng.choice(RANSOM_EXTENSIONS),
+        secs=rng.randint(10, 120),
+        interval=rng.choice([30, 60, 90, 120, 300]),
+        size=round(rng.uniform(0.5, 15.0), 1),
+        data_type=rng.choice(DATA_TYPES),
+        port=rng.choice([3333, 4444, 5555, 8080, 8443, 9090]),
+    )
+    # Collect IOC indicators for the alert
+    ioc_indicators = []
+    if threat["iocs"]["hashes"]:
+        ioc_indicators.append(rng.choice(threat["iocs"]["hashes"]))
+    if threat["iocs"]["ips"]:
+        ioc_indicators.append(rng.choice(threat["iocs"]["ips"]))
+    if threat["iocs"]["domains"]:
+        ioc_indicators.append(rng.choice(threat["iocs"]["domains"]))
+    # Determine subnet from host prefix
+    subnet = "corporate"
+    for sn, info in SUBNETS.items():
+        if host.startswith(info["prefix"]):
+            subnet = sn
+            break
+    # Severity
+    severity_weights = SEVERITY_WEIGHTS.get(
+        "hard" if attack_type in ("data_exfiltration", "ransomware", "privilege_escalation") else "medium",
+        SEVERITY_WEIGHTS["medium"]
+    )
+    severity = rng.choices(SEVERITIES, weights=severity_weights, k=1)[0]
+    # Timestamp (spread across a few hours)
+    minutes_offset = timestamp_base + alert_idx * rng.randint(5, 45)
+    hour = 6 + (minutes_offset // 60)
+    minute = minutes_offset % 60
+    timestamp = f"2025-01-15T{hour:02d}:{minute:02d}:00Z"
+    return {
+        "alert_id": f"ALERT-{task_prefix}{alert_idx + 1:03d}",
+        "timestamp": timestamp,
+        "source_host": host,
+        "severity": severity,
+        "threat_type": attack_type,
+        "description": description,
+        "ioc_indicators": ioc_indicators,
+        "subnet": subnet,
+        "is_acknowledged": False,
+    }
+# =============================================================================
+# Main Generator Function
+# =============================================================================
+def generate_task(task_id: str) -> Dict[str, Any]:
+    """Generate a complete, deterministic task definition from a task_id.
+    The task_id is hashed to create a seed, ensuring the same task_id
+    always produces the exact same scenario.
+    Args:
+        task_id: Any string (e.g. 'gen_0001', 'gen_0500', 'phishing_test')
+    Returns:
+        A task_def dict compatible with CyberSOCEnvironment.reset()
+    """
+    seed = _seed_from_task_id(task_id)
+    rng = random.Random(seed)
+    # Determine difficulty
+    difficulty = _get_difficulty(task_id, rng)
+    # Configure parameters based on difficulty
+    if difficulty == "easy":
+        num_threats = 1
+        max_steps = rng.randint(12, 18)
+        initial_impact = round(rng.uniform(0.02, 0.08), 2)
+        impact_per_step = round(rng.uniform(0.01, 0.03), 3)
+        num_subnets = rng.randint(1, 2)
+    elif difficulty == "medium":
+        num_threats = rng.randint(2, 3)
+        max_steps = rng.randint(20, 28)
+        initial_impact = round(rng.uniform(0.08, 0.15), 2)
+        impact_per_step = round(rng.uniform(0.02, 0.04), 3)
+        num_subnets = rng.randint(2, 4)
+    else:  # hard
+        num_threats = rng.randint(3, 6)
+        max_steps = rng.randint(25, 35)
+        initial_impact = round(rng.uniform(0.15, 0.25), 2)
+        impact_per_step = round(rng.uniform(0.03, 0.05), 3)
+        num_subnets = rng.randint(3, 6)
+    # Pick attack types for this scenario
+    all_attack_types = list(MALWARE_PROCESSES.keys())
+    if difficulty == "easy":
+        # Easy: single focused attack
+        attack_types = [rng.choice(all_attack_types)]
+    elif difficulty == "medium":
+        # Medium: multi-stage, pick a plausible chain
+        chains = [
+            ["phishing", "credential_theft", "lateral_movement"],
+            ["phishing", "c2_communication", "data_exfiltration"],
+            ["webshell", "privilege_escalation", "lateral_movement"],
+            ["supply_chain", "c2_communication", "credential_theft"],
+            ["botnet", "cryptomining", "lateral_movement"],
+            ["insider_threat", "data_exfiltration"],
+        ]
+        chain = rng.choice(chains)
+        attack_types = chain[:num_threats]
+    else:
+        # Hard: complex multi-phase APT
+        chains = [
+            ["phishing", "c2_communication", "privilege_escalation", "data_exfiltration", "ransomware"],
+            ["supply_chain", "c2_communication", "lateral_movement", "credential_theft", "data_exfiltration", "ransomware"],
+            ["webshell", "privilege_escalation", "c2_communication", "lateral_movement", "data_exfiltration"],
+            ["phishing", "credential_theft", "lateral_movement", "cryptomining", "botnet"],
+            ["insider_threat", "privilege_escalation", "data_exfiltration", "c2_communication"],
+            ["botnet", "lateral_movement", "privilege_escalation", "ransomware", "data_exfiltration"],
+        ]
+        chain = rng.choice(chains)
+        attack_types = chain[:num_threats]
+    # Pick subnets involved
+    involved_subnets = _pick_subnets(rng, num_subnets)
+    # Generate attack chain
+    attack_chain: List[Dict[str, Any]] = []
+    used_hosts: set = set()
+    task_prefix = task_id.replace("gen_", "G").upper()[:6]
+    for i, attack_type in enumerate(attack_types):
+        phase_idx = min(i, len(ATTACK_PHASES) - 1)
+        # Use realistic phase based on attack type
+        phase_map = {
+            "phishing": "initial_access",
+            "webshell": "initial_access",
+            "supply_chain": "initial_access",
+            "credential_theft": "credential_access",
+            "privilege_escalation": "privilege_escalation",
+            "lateral_movement": "lateral_movement",
+            "c2_communication": "command_and_control",
+            "data_exfiltration": "exfiltration",
+            "ransomware": "impact",
+            "cryptomining": "impact",
+            "insider_threat": "exfiltration",
+            "botnet": "command_and_control",
+        }
+        phase = phase_map.get(attack_type, ATTACK_PHASES[phase_idx])
+        threat_id = f"T-{task_prefix}-{i + 1:03d}"
+        threat, new_hosts = _generate_threat(
+            rng, threat_id, attack_type, phase, involved_subnets, used_hosts
+        )
+        attack_chain.append(threat)
+        used_hosts.update(new_hosts)
+    # Generate alerts (1-2 per threat)
+    initial_alerts: List[Dict[str, Any]] = []
+    timestamp_base = rng.randint(0, 60)
+    for i, threat in enumerate(attack_chain):
+        num_alerts = rng.randint(1, 2)
+        for j in range(num_alerts):
+            alert = _generate_alert(
+                rng, len(initial_alerts), task_prefix, threat, timestamp_base
+            )
+            initial_alerts.append(alert)
+    # Generate containment requirements
+    must_kill = []
+    must_block_iocs = []
+    must_forensics = []
+    must_not_isolate = []
+    for threat in attack_chain:
+        for host in threat["compromised_hosts"]:
+            for proc in threat["malicious_processes"]:
+                must_kill.append({"hostname": host, "process": proc})
+            if host not in must_forensics:
+                must_forensics.append(host)
+        # Collect all IOCs as required blocks
+        for h in threat["iocs"]["hashes"]:
+            if h not in must_block_iocs:
+                must_block_iocs.append(h)
+        for ip in threat["iocs"]["ips"]:
+            if ip not in must_block_iocs:
+                must_block_iocs.append(ip)
+        for d in threat["iocs"]["domains"]:
+            if d not in must_block_iocs:
+                must_block_iocs.append(d)
+    # Subnets that should NOT be isolated (business-critical ones not in the attack)
+    non_involved = [s for s in SUBNETS if s not in involved_subnets]
+    if difficulty == "easy":
+        must_not_isolate = non_involved
+    elif difficulty == "medium":
+        must_not_isolate = [s for s in non_involved if SUBNETS[s]["criticality"] >= 0.8]
+    # Build description
+    type_names = list(set(t["threat_type"] for t in attack_chain))
+    host_count = len(used_hosts)
+    desc = (
+        f"[{difficulty.upper()}] {', '.join(type_names).replace('_', ' ').title()} "
+        f"across {host_count} host(s) in {', '.join(involved_subnets)}."
+    )
+    return {
+        "description": desc,
+        "max_steps": max_steps,
+        "initial_business_impact": initial_impact,
+        "impact_per_step": impact_per_step,
+        "attack_chain": attack_chain,
+        "initial_alerts": initial_alerts,
+        "optimal_actions": [
+            "run_forensics", "kill_process", "block_ioc", "submit_containment_plan"
+        ],
+        "containment_requirements": {
+            "must_kill": must_kill,
+            "must_block_iocs": must_block_iocs,
+            "must_forensics": must_forensics,
+            "must_not_isolate": must_not_isolate,
+        },
+    }
+# =============================================================================
+# Batch Generation (for openenv.yaml and validation)
+# =============================================================================
+def list_generated_task_ids(count: int = 1000) -> List[str]:
+    """Return the list of generated task IDs."""
+    return [f"gen_{i:04d}" for i in range(1, count + 1)]
+def get_task_summary(task_id: str) -> Dict[str, str]:
+    """Get a short summary of a generated task (for openenv.yaml)."""
+    task_def = generate_task(task_id)
+    difficulty = _get_difficulty(task_id, random.Random(_seed_from_task_id(task_id)))
+    return {
+        "description": task_def["description"],
+        "max_steps": task_def["max_steps"],
+        "difficulty": difficulty,
+    }

server/tasks.py ADDED Viewed

	@@ -0,0 +1,513 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Deterministic task definitions for CyberSOCEnv.
+Each task defines a fixed attack chain, network layout, and expected
+containment actions. No randomness — every run of the same task_id
+produces identical initial state.
+Tasks:
+    - easy:   Single ransomware endpoint on the corporate subnet.
+    - medium: Multi-stage lateral movement (phishing -> cred theft -> 3 subnets).
+    - hard:   APT + ransomware with C2, exfiltration, and executive pressure.
+"""
+from __future__ import annotations
+from typing import Any, Dict, List
+# =============================================================================
+# Network Topology Builder (deterministic, 500-node)
+# =============================================================================
+def _build_subnet(
+    name: str,
+    role: str,
+    prefix: str,
+    ip_base: str,
+    count: int,
+    start_idx: int,
+    criticality: float,
+    default_ports: List[int],
+    default_procs: List[str],
+) -> List[Dict[str, Any]]:
+    """Build a list of host dicts for a subnet."""
+    hosts = []
+    for i in range(count):
+        idx = start_idx + i
+        hosts.append({
+            "hostname": f"{prefix}-{idx:03d}",
+            "ip_address": f"{ip_base}.{idx}",
+            "subnet": name,
+            "role": role,
+            "status": "online",
+            "running_processes": list(default_procs),
+            "open_ports": list(default_ports),
+            "criticality": criticality,
+        })
+    return hosts
+def build_network() -> Dict[str, List[Dict[str, Any]]]:
+    """Build the deterministic 500-node enterprise network.
+    Returns:
+        Dict mapping subnet name -> list of host dicts.
+    """
+    network: Dict[str, List[Dict[str, Any]]] = {}
+    # Corporate (150 workstations)
+    network["corporate"] = _build_subnet(
+        name="corporate", role="corporate", prefix="WS",
+        ip_base="10.1.1", count=150, start_idx=1,
+        criticality=0.3,
+        default_ports=[135, 445, 3389],
+        default_procs=["outlook.exe", "chrome.exe", "explorer.exe"],
+    )
+    # Engineering (100 dev machines)
+    network["engineering"] = _build_subnet(
+        name="engineering", role="engineering", prefix="DEV",
+        ip_base="10.2.1", count=100, start_idx=1,
+        criticality=0.5,
+        default_ports=[22, 443, 8080, 3389],
+        default_procs=["vscode.exe", "python.exe", "docker.exe", "git.exe"],
+    )
+    # Finance (50 machines)
+    network["finance"] = _build_subnet(
+        name="finance", role="finance", prefix="FIN",
+        ip_base="10.3.1", count=50, start_idx=1,
+        criticality=0.8,
+        default_ports=[443, 1433, 3389],
+        default_procs=["excel.exe", "sap.exe", "sqlcmd.exe"],
+    )
+    # DMZ (30 servers)
+    network["dmz"] = _build_subnet(
+        name="dmz", role="dmz", prefix="DMZ",
+        ip_base="10.4.1", count=30, start_idx=1,
+        criticality=0.6,
+        default_ports=[80, 443, 8443],
+        default_procs=["nginx", "node", "java"],
+    )
+    # Datacenter (50 servers)
+    network["datacenter"] = _build_subnet(
+        name="datacenter", role="datacenter", prefix="SRV",
+        ip_base="10.5.1", count=50, start_idx=1,
+        criticality=0.9,
+        default_ports=[22, 443, 5432, 6379, 9200],
+        default_procs=["postgres", "redis-server", "elasticsearch", "kubelet"],
+    )
+    # Executive (20 machines)
+    network["executive"] = _build_subnet(
+        name="executive", role="executive", prefix="EXEC",
+        ip_base="10.6.1", count=20, start_idx=1,
+        criticality=1.0,
+        default_ports=[443, 3389],
+        default_procs=["outlook.exe", "teams.exe", "chrome.exe"],
+    )
+    return network
+# =============================================================================
+# Attack Chain Definitions
+# =============================================================================
+TASKS: Dict[str, Dict[str, Any]] = {
+    # ----- EASY: Single ransomware endpoint -----
+    "easy": {
+        "description": "Ransomware detected on a single corporate workstation. Isolate and contain.",
+        "max_steps": 15,
+        "initial_business_impact": 0.05,
+        "impact_per_step": 0.02,  # Impact grows slowly per step
+        "attack_chain": [
+            {
+                "threat_id": "T-EASY-001",
+                "threat_type": "ransomware",
+                "phase": "execution",
+                "compromised_hosts": ["WS-042"],
+                "malicious_processes": ["cryptolocker.exe"],
+                "c2_servers": [],
+                "iocs": {
+                    "hashes": ["e99a18c428cb38d5f260853678922e03"],
+                    "ips": [],
+                    "domains": [],
+                },
+                "lateral_targets": [],
+                "exfil_targets": [],
+            },
+        ],
+        "initial_alerts": [
+            {
+                "alert_id": "ALERT-E001",
+                "timestamp": "2025-01-15T09:23:17Z",
+                "source_host": "WS-042",
+                "severity": "critical",
+                "threat_type": "ransomware",
+                "description": "EDR detected file encryption activity on WS-042. Process 'cryptolocker.exe' is encrypting files in C:\\Users\\jsmith\\Documents.",
+                "ioc_indicators": ["e99a18c428cb38d5f260853678922e03"],
+                "subnet": "corporate",
+                "is_acknowledged": False,
+            },
+            {
+                "alert_id": "ALERT-E002",
+                "timestamp": "2025-01-15T09:23:45Z",
+                "source_host": "WS-042",
+                "severity": "high",
+                "threat_type": "ransomware",
+                "description": "Anomalous file system activity: 147 files renamed with .locked extension in 28 seconds.",
+                "ioc_indicators": [],
+                "subnet": "corporate",
+                "is_acknowledged": False,
+            },
+        ],
+        # Optimal containment: kill process, run forensics, block hash, submit plan
+        "optimal_actions": ["kill_process", "run_forensics", "block_ioc", "submit_containment_plan"],
+        "containment_requirements": {
+            "must_kill": [{"hostname": "WS-042", "process": "cryptolocker.exe"}],
+            "must_block_iocs": ["e99a18c428cb38d5f260853678922e03"],
+            "must_forensics": ["WS-042"],
+            "must_not_isolate": ["finance", "engineering", "datacenter"],  # Unnecessary isolation = downtime
+        },
+    },
+    # ----- MEDIUM: Multi-stage lateral movement -----
+    "medium": {
+        "description": "Phishing attack led to credential theft and lateral movement across 3 subnets.",
+        "max_steps": 25,
+        "initial_business_impact": 0.10,
+        "impact_per_step": 0.03,
+        "attack_chain": [
+            {
+                "threat_id": "T-MED-001",
+                "threat_type": "phishing",
+                "phase": "initial_access",
+                "compromised_hosts": ["WS-017"],
+                "malicious_processes": ["powershell.exe"],
+                "c2_servers": [],
+                "iocs": {
+                    "hashes": ["d41d8cd98f00b204e9800998ecf8427e"],
+                    "ips": [],
+                    "domains": ["evil-login.example.com"],
+                },
+                "lateral_targets": [],
+                "exfil_targets": [],
+            },
+            {
+                "threat_id": "T-MED-002",
+                "threat_type": "credential_theft",
+                "phase": "credential_access",
+                "compromised_hosts": ["WS-017"],
+                "malicious_processes": ["mimikatz.exe"],
+                "c2_servers": [],
+                "iocs": {
+                    "hashes": ["aabbccdd11223344eeff5566778899aa"],
+                    "ips": [],
+                    "domains": [],
+                },
+                "lateral_targets": ["DEV-033", "FIN-012"],
+                "exfil_targets": [],
+            },
+            {
+                "threat_id": "T-MED-003",
+                "threat_type": "lateral_movement",
+                "phase": "lateral_movement",
+                "compromised_hosts": ["DEV-033", "FIN-012"],
+                "malicious_processes": ["svchost_backdoor.exe"],
+                "c2_servers": [],
+                "iocs": {
+                    "hashes": ["112233445566778899aabbccddeeff00"],
+                    "ips": ["203.0.113.50"],
+                    "domains": [],
+                },
+                "lateral_targets": ["SRV-005"],
+                "exfil_targets": [],
+            },
+        ],
+        "initial_alerts": [
+            {
+                "alert_id": "ALERT-M001",
+                "timestamp": "2025-01-15T08:15:00Z",
+                "source_host": "WS-017",
+                "severity": "medium",
+                "threat_type": "phishing",
+                "description": "User clicked suspicious link in email. PowerShell execution detected downloading payload from evil-login.example.com.",
+                "ioc_indicators": ["evil-login.example.com"],
+                "subnet": "corporate",
+                "is_acknowledged": False,
+            },
+            {
+                "alert_id": "ALERT-M002",
+                "timestamp": "2025-01-15T08:32:00Z",
+                "source_host": "WS-017",
+                "severity": "high",
+                "threat_type": "credential_theft",
+                "description": "LSASS memory access detected — possible credential dumping via Mimikatz.",
+                "ioc_indicators": ["aabbccdd11223344eeff5566778899aa"],
+                "subnet": "corporate",
+                "is_acknowledged": False,
+            },
+            {
+                "alert_id": "ALERT-M003",
+                "timestamp": "2025-01-15T09:05:00Z",
+                "source_host": "DEV-033",
+                "severity": "high",
+                "threat_type": "lateral_movement",
+                "description": "Suspicious RDP login from WS-017 using admin credentials. New process svchost_backdoor.exe spawned.",
+                "ioc_indicators": ["203.0.113.50", "112233445566778899aabbccddeeff00"],
+                "subnet": "engineering",
+                "is_acknowledged": False,
+            },
+            {
+                "alert_id": "ALERT-M004",
+                "timestamp": "2025-01-15T09:12:00Z",
+                "source_host": "FIN-012",
+                "severity": "critical",
+                "threat_type": "lateral_movement",
+                "description": "Unauthorized access to FIN-012 from compromised credentials. Backdoor process active.",
+                "ioc_indicators": ["112233445566778899aabbccddeeff00"],
+                "subnet": "finance",
+                "is_acknowledged": False,
+            },
+        ],
+        "optimal_actions": [
+            "query_host", "run_forensics", "kill_process", "block_ioc",
+            "isolate_segment", "run_forensics", "submit_containment_plan",
+        ],
+        "containment_requirements": {
+            "must_kill": [
+                {"hostname": "WS-017", "process": "powershell.exe"},
+                {"hostname": "WS-017", "process": "mimikatz.exe"},
+                {"hostname": "DEV-033", "process": "svchost_backdoor.exe"},
+                {"hostname": "FIN-012", "process": "svchost_backdoor.exe"},
+            ],
+            "must_block_iocs": [
+                "evil-login.example.com",
+                "203.0.113.50",
+                "d41d8cd98f00b204e9800998ecf8427e",
+                "aabbccdd11223344eeff5566778899aa",
+                "112233445566778899aabbccddeeff00",
+            ],
+            "must_forensics": ["WS-017", "DEV-033", "FIN-012"],
+            "must_not_isolate": ["executive", "datacenter"],
+        },
+    },
+    # ----- HARD: APT + Ransomware, C2, exfiltration, executive pressure -----
+    "hard": {
+        "description": "Advanced Persistent Threat with active C2 comms, data exfiltration in progress, and ransomware deployment imminent. Board is watching — minimize downtime.",
+        "max_steps": 30,
+        "initial_business_impact": 0.20,
+        "impact_per_step": 0.04,
+        "attack_chain": [
+            {
+                "threat_id": "T-HARD-001",
+                "threat_type": "phishing",
+                "phase": "initial_access",
+                "compromised_hosts": ["EXEC-003"],
+                "malicious_processes": ["outlook_macro.exe"],
+                "c2_servers": ["198.51.100.77"],
+                "iocs": {
+                    "hashes": ["deadbeef0123456789abcdef01234567"],
+                    "ips": ["198.51.100.77"],
+                    "domains": ["cdn-update.malware-c2.net"],
+                },
+                "lateral_targets": ["WS-088"],
+                "exfil_targets": [],
+            },
+            {
+                "threat_id": "T-HARD-002",
+                "threat_type": "c2_communication",
+                "phase": "command_and_control",
+                "compromised_hosts": ["EXEC-003", "WS-088"],
+                "malicious_processes": ["svchost_c2.exe"],
+                "c2_servers": ["198.51.100.77"],
+                "iocs": {
+                    "hashes": ["cafebabe9876543210fedcba98765432"],
+                    "ips": ["198.51.100.77"],
+                    "domains": ["cdn-update.malware-c2.net", "exfil.malware-c2.net"],
+                },
+                "lateral_targets": ["SRV-002", "FIN-008"],
+                "exfil_targets": [],
+            },
+            {
+                "threat_id": "T-HARD-003",
+                "threat_type": "privilege_escalation",
+                "phase": "privilege_escalation",
+                "compromised_hosts": ["SRV-002"],
+                "malicious_processes": ["exploit_kernel.exe"],
+                "c2_servers": ["198.51.100.77"],
+                "iocs": {
+                    "hashes": ["1122334455667788aabbccddeeff0011"],
+                    "ips": ["198.51.100.77"],
+                    "domains": [],
+                },
+                "lateral_targets": ["SRV-010", "SRV-015"],
+                "exfil_targets": ["SRV-002"],
+            },
+            {
+                "threat_id": "T-HARD-004",
+                "threat_type": "data_exfiltration",
+                "phase": "exfiltration",
+                "compromised_hosts": ["SRV-002", "FIN-008"],
+                "malicious_processes": ["data_pump.exe"],
+                "c2_servers": ["198.51.100.77"],
+                "iocs": {
+                    "hashes": ["ffeeddccbbaa99887766554433221100"],
+                    "ips": ["198.51.100.77", "203.0.113.99"],
+                    "domains": ["exfil.malware-c2.net"],
+                },
+                "lateral_targets": [],
+                "exfil_targets": ["SRV-002", "FIN-008"],
+            },
+            {
+                "threat_id": "T-HARD-005",
+                "threat_type": "ransomware",
+                "phase": "impact",
+                "compromised_hosts": ["SRV-010", "SRV-015"],
+                "malicious_processes": ["blackcat_ransom.exe"],
+                "c2_servers": [],
+                "iocs": {
+                    "hashes": ["aabb0011ccdd2233eeff4455667788"],
+                    "ips": [],
+                    "domains": [],
+                },
+                "lateral_targets": [],
+                "exfil_targets": [],
+            },
+        ],
+        "initial_alerts": [
+            {
+                "alert_id": "ALERT-H001",
+                "timestamp": "2025-01-15T06:00:00Z",
+                "source_host": "EXEC-003",
+                "severity": "medium",
+                "threat_type": "phishing",
+                "description": "Executive VP opened macro-enabled document. Outbound connection to cdn-update.malware-c2.net detected.",
+                "ioc_indicators": ["cdn-update.malware-c2.net", "198.51.100.77"],
+                "subnet": "executive",
+                "is_acknowledged": False,
+            },
+            {
+                "alert_id": "ALERT-H002",
+                "timestamp": "2025-01-15T06:45:00Z",
+                "source_host": "WS-088",
+                "severity": "high",
+                "threat_type": "c2_communication",
+                "description": "Periodic beaconing detected to 198.51.100.77 every 60 seconds. Encrypted payload exchange observed.",
+                "ioc_indicators": ["198.51.100.77", "cafebabe9876543210fedcba98765432"],
+                "subnet": "corporate",
+                "is_acknowledged": False,
+            },
+            {
+                "alert_id": "ALERT-H003",
+                "timestamp": "2025-01-15T07:30:00Z",
+                "source_host": "SRV-002",
+                "severity": "critical",
+                "threat_type": "privilege_escalation",
+                "description": "Kernel exploit attempt on SRV-002 (database server). Process exploit_kernel.exe gained SYSTEM privileges.",
+                "ioc_indicators": ["1122334455667788aabbccddeeff0011"],
+                "subnet": "datacenter",
+                "is_acknowledged": False,
+            },
+            {
+                "alert_id": "ALERT-H004",
+                "timestamp": "2025-01-15T08:00:00Z",
+                "source_host": "SRV-002",
+                "severity": "critical",
+                "threat_type": "data_exfiltration",
+                "description": "Large data transfer (2.3 GB) to external IP 203.0.113.99 from database server SRV-002. Possible exfiltration of customer PII.",
+                "ioc_indicators": ["203.0.113.99", "exfil.malware-c2.net"],
+                "subnet": "datacenter",
+                "is_acknowledged": False,
+            },
+            {
+                "alert_id": "ALERT-H005",
+                "timestamp": "2025-01-15T08:10:00Z",
+                "source_host": "FIN-008",
+                "severity": "critical",
+                "threat_type": "data_exfiltration",
+                "description": "Financial records being staged for exfiltration on FIN-008. Process data_pump.exe accessing sensitive directories.",
+                "ioc_indicators": ["ffeeddccbbaa99887766554433221100"],
+                "subnet": "finance",
+                "is_acknowledged": False,
+            },
+            {
+                "alert_id": "ALERT-H006",
+                "timestamp": "2025-01-15T08:30:00Z",
+                "source_host": "SRV-010",
+                "severity": "critical",
+                "threat_type": "ransomware",
+                "description": "BlackCat ransomware deployment detected on SRV-010! File encryption starting on production storage.",
+                "ioc_indicators": ["aabb0011ccdd2233eeff4455667788"],
+                "subnet": "datacenter",
+                "is_acknowledged": False,
+            },
+        ],
+        "optimal_actions": [
+            "block_ioc", "kill_process", "run_forensics", "isolate_segment",
+            "kill_process", "block_ioc", "run_forensics", "kill_process",
+            "submit_containment_plan",
+        ],
+        "containment_requirements": {
+            "must_kill": [
+                {"hostname": "EXEC-003", "process": "outlook_macro.exe"},
+                {"hostname": "EXEC-003", "process": "svchost_c2.exe"},
+                {"hostname": "WS-088", "process": "svchost_c2.exe"},
+                {"hostname": "SRV-002", "process": "exploit_kernel.exe"},
+                {"hostname": "SRV-002", "process": "data_pump.exe"},
+                {"hostname": "FIN-008", "process": "data_pump.exe"},
+                {"hostname": "SRV-010", "process": "blackcat_ransom.exe"},
+                {"hostname": "SRV-015", "process": "blackcat_ransom.exe"},
+            ],
+            "must_block_iocs": [
+                "198.51.100.77",
+                "203.0.113.99",
+                "cdn-update.malware-c2.net",
+                "exfil.malware-c2.net",
+                "deadbeef0123456789abcdef01234567",
+                "cafebabe9876543210fedcba98765432",
+            ],
+            "must_forensics": ["EXEC-003", "WS-088", "SRV-002", "FIN-008", "SRV-010"],
+            "must_not_isolate": [],  # In APT scenario, any isolation decision is valid
+        },
+    },
+}
+def get_task(task_id: str) -> Dict[str, Any]:
+    """Retrieve a task definition by ID.
+    Supports:
+        - 'easy', 'medium', 'hard': Hand-crafted curated benchmarks
+        - 'gen_0001' through 'gen_1000': Procedurally generated scenarios
+        - Any other string: Generated on-the-fly via seeded procedural generation
+    Args:
+        task_id: Task identifier string.
+    Returns:
+        Task definition dict.
+    """
+    # Check hand-crafted tasks first
+    if task_id in TASKS:
+        return TASKS[task_id]
+    # Fall back to procedural generation
+    try:
+        from .task_generator import generate_task
+    except ImportError:
+        from server.task_generator import generate_task
+    return generate_task(task_id)

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff

validate_submission.sh ADDED Viewed

	@@ -0,0 +1,159 @@

+#!/bin/bash
+set -uo pipefail
+DOCKER_BUILD_TIMEOUT=600
+if [ -t 1 ]; then
+  RED='\033[0;31m'
+  GREEN='\033[0;32m'
+  YELLOW='\033[1;33m'
+  BOLD='\033[1m'
+  NC='\033[0m'
+else
+  RED='' GREEN='' YELLOW='' BOLD='' NC=''
+fi
+run_with_timeout() {
+  local secs="$1"; shift
+  if command -v timeout &>/dev/null; then
+    timeout "$secs" "$@"
+  elif command -v gtimeout &>/dev/null; then
+    gtimeout "$secs" "$@"
+  else
+    "$@" &
+    local pid=$!
+    ( sleep "$secs" && kill "$pid" 2>/dev/null ) &
+    local watcher=$!
+    wait "$pid" 2>/dev/null
+    local rc=$?
+    kill "$watcher" 2>/dev/null
+    wait "$watcher" 2>/dev/null
+    return $rc
+  fi
+}
+portable_mktemp() {
+  local prefix="${1:-validate}"
+  mktemp "${TMPDIR:-/tmp}/${prefix}-XXXXXX" 2>/dev/null || mktemp
+}
+CLEANUP_FILES=()
+cleanup() { rm -f "${CLEANUP_FILES[@]+"${CLEANUP_FILES[@]}"}"; }
+trap cleanup EXIT
+PING_URL="${1:-}"
+REPO_DIR="${2:-.}"
+if [ -z "$PING_URL" ]; then
+  printf "Usage: %s <ping_url> [repo_dir]\n" "$0"
+  printf "\n"
+  printf "  ping_url   Your HuggingFace Space URL (e.g. https://your-space.hf.space)\n"
+  printf "  repo_dir   Path to your repo (default: current directory)\n"
+  exit 1
+fi
+if ! REPO_DIR="$(cd "$REPO_DIR" 2>/dev/null && pwd)"; then
+  printf "Error: directory '%s' not found\n" "${2:-.}"
+  exit 1
+fi
+PING_URL="${PING_URL%/}"
+export PING_URL
+PASS=0
+log()  { printf "[%s] %b\n" "$(date -u +%H:%M:%S)" "$*"; }
+pass() { log "${GREEN}PASSED${NC} -- $1"; PASS=$((PASS + 1)); }
+fail() { log "${RED}FAILED${NC} -- $1"; }
+hint() { printf "  ${YELLOW}Hint:${NC} %b\n" "$1"; }
+stop_at() {
+  printf "\n"
+  printf "${RED}${BOLD}Validation stopped at %s.${NC} Fix the above before continuing.\n" "$1"
+  exit 1
+}
+printf "\n"
+printf "${BOLD}========================================${NC}\n"
+printf "${BOLD}  OpenEnv Submission Validator${NC}\n"
+printf "${BOLD}========================================${NC}\n"
+log "Repo:     $REPO_DIR"
+log "Ping URL: $PING_URL"
+printf "\n"
+log "${BOLD}Step 1/3: Pinging HF Space${NC} ($PING_URL/reset) ..."
+CURL_OUTPUT=$(portable_mktemp "validate-curl")
+CLEANUP_FILES+=("$CURL_OUTPUT")
+HTTP_CODE=$(curl -s -o "$CURL_OUTPUT" -w "%{http_code}" -X POST \
+  -H "Content-Type: application/json" -d '{}' \
+  "$PING_URL/reset" --max-time 30 2>"$CURL_OUTPUT" || printf "000")
+if [ "$HTTP_CODE" = "200" ]; then
+  pass "HF Space is live and responds to /reset"
+elif [ "$HTTP_CODE" = "000" ]; then
+  fail "HF Space not reachable (connection failed or timed out)"
+  hint "Check your network connection and that the Space is running."
+  hint "Try: curl -s -o /dev/null -w '%%{http_code}' -X POST $PING_URL/reset"
+  stop_at "Step 1"
+else
+  fail "HF Space /reset returned HTTP $HTTP_CODE (expected 200)"
+  hint "Make sure your Space is running and the URL is correct."
+  hint "Try opening $PING_URL in your browser first."
+  stop_at "Step 1"
+fi
+log "${BOLD}Step 2/3: Running docker build${NC} ..."
+if ! command -v docker &>/dev/null; then
+  fail "docker command not found"
+  hint "Install Docker: https://docs.docker.com/get-docker/"
+  stop_at "Step 2"
+fi
+if [ -f "$REPO_DIR/Dockerfile" ]; then
+  DOCKER_CONTEXT="$REPO_DIR"
+elif [ -f "$REPO_DIR/server/Dockerfile" ]; then
+  DOCKER_CONTEXT="$REPO_DIR/server"
+else
+  fail "No Dockerfile found in repo root or server/ directory"
+  stop_at "Step 2"
+fi
+log "  Found Dockerfile in $DOCKER_CONTEXT"
+BUILD_OK=false
+BUILD_OUTPUT=$(run_with_timeout "$DOCKER_BUILD_TIMEOUT" docker build "$DOCKER_CONTEXT" 2>&1) && BUILD_OK=true
+if [ "$BUILD_OK" = true ]; then
+  pass "Docker build succeeded"
+else
+  fail "Docker build failed (timeout=${DOCKER_BUILD_TIMEOUT}s)"
+  printf "%s\n" "$BUILD_OUTPUT" | tail -20
+  stop_at "Step 2"
+fi
+log "${BOLD}Step 3/3: Running openenv validate${NC} ..."
+if ! command -v openenv &>/dev/null; then
+  fail "openenv command not found"
+  hint "Install it: pip install openenv-core"
+  stop_at "Step 3"
+fi
+VALIDATE_OK=false
+VALIDATE_OUTPUT=$(cd "$REPO_DIR" && openenv validate 2>&1) && VALIDATE_OK=true
+if [ "$VALIDATE_OK" = true ]; then
+  pass "openenv validate passed"
+  [ -n "$VALIDATE_OUTPUT" ] && log "  $VALIDATE_OUTPUT"
+else
+  fail "openenv validate failed"
+  printf "%s\n" "$VALIDATE_OUTPUT"
+  stop_at "Step 3"
+fi
+printf "\n"
+printf "${BOLD}========================================${NC}\n"
+printf "${GREEN}${BOLD}  All 3/3 checks passed!${NC}\n"
+printf "${GREEN}${BOLD}  Your submission is ready to submit.${NC}\n"
+printf "${BOLD}========================================${NC}\n"
+printf "\n"
+exit 0