open_env / DOCUMENTATION.md
iitian's picture
Standardize API environment variables, update port to 7860, and bump version to 0.2.0
547b872

β˜οΈπŸ›‘οΈ CloudSecurityAuditor β€” OpenEnv Environment

Complete Application Documentation


1. What Is This Application?

CloudSecurityAuditor is a standardized AI agent environment that simulates real-world cloud security auditing scenarios. It is built using the OpenEnv specification β€” an open standard for creating reproducible, programmable environments where AI agents can be trained, tested, and benchmarked.

Think of it as a virtual cybersecurity lab: instead of risking real cloud infrastructure, an AI agent (or a human) can interact with a mock cloud environment that contains intentional security vulnerabilities. The agent must discover, analyze, and remediate those vulnerabilities to earn a reward.

Who Is This For?

Audience Use Case
AI Researchers Benchmark LLM-based security agents on structured tasks
Security Engineers Practice cloud audit workflows in a safe sandbox
Students Learn about S3 public buckets, EC2 security groups, and IAM log analysis
Hackathon Participants Demonstrate agent-environment interaction for Meta/OpenEnv challenges

2. Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   BROWSER (UI)                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Sidebar  β”‚  β”‚ Resource Gridβ”‚  β”‚ Execution Logβ”‚  β”‚
β”‚  β”‚ (Tasks)  β”‚  β”‚ (S3 / EC2)  β”‚  β”‚ (Terminal)   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚  HTTP (REST)
                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              FastAPI Server (app.py)                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ /reset  β”‚  β”‚  /step   β”‚  β”‚  /state  /  /docs β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚       β”‚            β”‚                                β”‚
β”‚       β–Ό            β–Ό                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚         CloudAuditEnv (environment.py)      β”‚    β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚    β”‚
β”‚  β”‚  β”‚ S3 Data β”‚  β”‚EC2 Dataβ”‚  β”‚ Auth Logs    β”‚ β”‚    β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. File Structure

scaler/
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ app.py              # FastAPI entry point, static file serving
β”‚   β”œβ”€β”€ environment.py      # Core environment logic (reset, step, state)
β”‚   β”œβ”€β”€ models.py           # Pydantic/dataclass models (Action, Observation, State)
β”‚   β”œβ”€β”€ tasks.py            # Task definitions (Easy, Medium, Hard)
β”‚   └── static/
β”‚       β”œβ”€β”€ index.html      # Dashboard UI layout
β”‚       β”œβ”€β”€ index.css       # Dark-mode cybersecurity theme
β”‚       └── app.js          # Frontend logic & API interaction
β”œβ”€β”€ scripts/
β”‚   └── baseline_inference.py   # Example agent that solves the Easy task
β”œβ”€β”€ openenv.yaml            # OpenEnv specification file
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ Dockerfile              # Docker deployment configuration
└── README.md               # Quick-start guide

4. The Environment Engine (environment.py)

The heart of the application is the CloudAuditEnv class. It implements three methods required by the OpenEnv spec:

reset(task_id) β†’ Observation

  • Reinitializes the mock infrastructure (S3 buckets, EC2 instances, auth logs).
  • Sets the active task (easy, medium, or hard).
  • Returns an initial observation with status info.

step(action) β†’ Observation

  • Accepts a CloudAction and executes it against the mock infrastructure.
  • Returns an updated CloudObservation containing discovered resources, details, logs, and a reward signal.
  • Automatically terminates the episode after 20 steps (truncation).

state() β†’ CloudState

  • Returns internal metadata: episode ID, step count, task ID, completion status, and cumulative score.

5. Mock Infrastructure

The environment simulates the following cloud resources:

S3 Buckets (3 total)

ID Region Public? Environment
prod-data-001 us-east-1 βœ… Yes prod
prod-logs-002 us-east-1 ❌ No prod
dev-test-01 us-west-2 βœ… Yes dev

EC2 Instances (2 total)

ID Type State Environment Open Ports
i-0abcdef1234567890 t2.micro running dev 22 (SSH), 3389 (RDP) ⚠️
i-0987654321fedcba0 m5.large running prod 443 (HTTPS)

Auth Logs (auth-logs)

Timestamp User Action IP
2026-04-05T10:00:00Z admin Login 1.1.1.1
2026-04-05T10:15:00Z iam-role-01 DeleteStorage ⚠️ 192.168.1.50
2026-04-05T10:30:00Z user-02 ListBuckets 2.2.2.2

6. Action Space

The agent interacts with the environment using a CloudAction object. Available action types:

Action Parameters Description
list resource_type (s3, ec2) Lists all resources of a given type
describe resource_id Returns full details for a specific resource
modify resource_id, patch Updates resource configuration (e.g., security group rules)
logs resource_id (e.g., auth-logs) Fetches log entries for a service
submit answer Submits the final answer for grading

Example Actions (via Dashboard or API)

# List all S3 buckets
list s3

# Describe an EC2 instance
describe i-0abcdef1234567890

# Fetch authentication logs
logs auth-logs

# Submit an answer for Easy task
submit prod-data-001

# Submit an answer for Hard task
submit 192.168.1.50

7. Observation Space

Every step() and reset() returns a CloudObservation:

Field Type Description
resources List[Dict] List of discovered resource records
details Dict Full metadata for a single described resource
logs List[Dict] Log entries (timestamp, user, action, IP)
status str Human-readable status message
info str Additional context (e.g., grading feedback)
reward float Scalar reward (0.0 to 1.0)
done bool Whether the episode has ended

8. Tasks & Grading

Task 1: Easy β€” S3 Public Audit

Goal: Identify all S3 buckets that are both public: true AND tagged env: prod.

Step Action Expected Result
1 list s3 Returns 3 buckets
2 Filter for public + prod prod-data-001
3 submit prod-data-001 Reward: 1.0 βœ…

Task 2: Medium β€” EC2 Security Patch

Goal: Find EC2 instance i-0abcdef1234567890 which has port 3389 (RDP) open to 0.0.0.0/0, and close it by modifying the security group to only allow port 22.

Step Action Expected Result
1 list ec2 Returns 2 instances
2 describe i-0abcdef1234567890 Shows RDP port open
3 modify i-0abcdef1234567890 with patch {"rules": [{"port": 22, "cidr": "0.0.0.0/0"}]} Reward: 1.0 βœ…

Task 3: Hard β€” IAM Log Forensic

Goal: A rogue IAM role (iam-role-01) has performed unauthorized actions. Analyze the auth-logs to identify the IP address that performed DeleteStorage.

Step Action Expected Result
1 logs auth-logs Returns 3 log entries
2 Find DeleteStorage action IP: 192.168.1.50
3 submit 192.168.1.50 Reward: 1.0 βœ…

9. API Reference

Base URL: http://localhost:7860

POST /reset

Reset the environment to a specific task.

Request:

{ "task_id": "easy" }

Response:

{
    "observation": {
        "resources": null,
        "details": null,
        "status": null,
        "logs": null,
        "info": "Environment reset. Task: easy"
    },
    "reward": 0.0,
    "done": false
}

POST /step

Execute an action in the environment.

Request:

{
    "action": {
        "action": "list",
        "resource_type": "s3"
    }
}

Response:

{
    "observation": {
        "resources": [
            { "id": "prod-data-001", "region": "us-east-1", "public": true, "tags": { "env": "prod" } },
            { "id": "prod-logs-002", "region": "us-east-1", "public": false, "tags": { "env": "prod" } },
            { "id": "dev-test-01", "region": "us-west-2", "public": true, "tags": { "env": "dev" } }
        ],
        "status": "Listed 3 s3 resources."
    },
    "reward": 0.0,
    "done": false
}

GET /state

Get internal environment state.

Response:

{
    "episode_id": "a1b2c3d4-...",
    "step_count": 3,
    "task_id": "easy",
    "is_completed": false,
    "score": 0.0
}

GET /docs

Interactive Swagger UI for API exploration.

GET /

Dashboard UI (the web interface).


10. Dashboard UI

The application includes a premium dark-mode cybersecurity dashboard accessible at http://localhost:7860.

Features

  • Sidebar Task Selector β€” Switch between Easy, Medium, and Hard challenges with one click.
  • Infrastructure Overview β€” Visual resource cards for S3 buckets and EC2 instances. Vulnerable resources are highlighted with red borders and blinking status dots.
  • Execution Log β€” Terminal-style console showing timestamped action logs with color-coded entries (blue for actions, green for system, yellow for rewards, red for errors).
  • Manual Command Input β€” Type commands like list s3, describe i-0abcdef1234567890, logs auth-logs, or submit prod-data-001 directly in the dashboard.
  • Live Stats HUD β€” Displays current task name, cumulative reward, and environment status (Active/Completed).

Design

  • Theme: Cyber-noir dark mode with deep navy background (#0a0e14)
  • Accents: Neon cyan (#00f5ff) for primary elements
  • Typography: Inter (body), Outfit (headings), JetBrains Mono (code/logs)
  • Effects: Glassmorphism panels, fade-in card animations, pulsing vulnerability indicators

11. Running the Application

Local Development

# Install dependencies
pip install -r requirements.txt

# Start the server
python -m server.app

# Open in browser
open http://localhost:7860

Running the Baseline Agent

# Solves the Easy task automatically
python scripts/baseline_inference.py

Docker Deployment

# Build the image
docker build -t cloud-security-auditor .

# Run the container
docker run -p 7860:7860 cloud-security-auditor

Hugging Face Spaces Deployment

  1. Create a new Space on Hugging Face.
  2. Select Docker as the SDK.
  3. Upload the repository contents (including openenv.yaml and Dockerfile).
  4. The entrypoint is automatically set via openenv.yaml.

12. Technology Stack

Component Technology
Backend Python 3.10, FastAPI, Uvicorn
Environment openenv-core β‰₯ 0.1.1
Data Models Python dataclasses
Frontend Vanilla HTML/CSS/JS
Fonts Google Fonts (Inter, Outfit, JetBrains Mono)
Deployment Docker, Hugging Face Spaces

13. OpenEnv Specification (openenv.yaml)

name: cloud-security-auditor
version: "0.2.0"
description: "A real-world cloud security audit environment for AI agents."
hardware:
  tier: "cpu-small"
  vCPU: 2
  RAM: 4Gi
port: 7860
entrypoint: "uvicorn server.app:app --host 0.0.0.0 --port 7860"
tags:
  - security
  - cloud
  - task-based
evaluation:
  tasks:
    - id: "easy"
      name: "S3 Public Audit"
      difficulty: "easy"
    - id: "medium"
      name: "EC2 Security Patch"
      difficulty: "medium"
    - id: "hard"
      name: "IAM Log Forensic"
      difficulty: "hard"

14. Extending the Environment

Adding a New Task

  1. Add the task definition to server/tasks.py.
  2. Add the corresponding mock data to _initialize_state() in environment.py.
  3. Add the grading logic to the step() method under CloudActionType.SUBMIT.
  4. Add a new task button to index.html in the sidebar.

Adding a New Resource Type

  1. Add the resource data to self.resources in environment.py.
  2. Add a handler for CloudActionType.LIST and CloudActionType.DESCRIBE for the new type.
  3. Update detectResourceType() in app.js to render the correct card icon/label.

Built for the Meta Hackathon / OpenEnv Challenge β€’ April 2026