Spaces:
Running
Running
Fix Phase 2 OpenEnv validation traps: add grader paths to openenv.yaml and safe parameterless defaults
699f953 | spec_version: 1 | |
| name: multi-agent-dev-tools-env | |
| description: > | |
| A multi-domain RL environment for training AI agents on real-world developer | |
| and clinical tasks. Covers MCP security auditing, PyTorch migration debugging, | |
| and clinical workflow chaos recovery. 9 tasks across 3 domains with graded | |
| difficulty (easy/medium/hard). | |
| type: environment | |
| runtime: docker | |
| port: 7860 | |
| # Action and Observation spaces use typed Pydantic models | |
| # See server/models/ for full definitions | |
| tasks: | |
| - id: sec_easy | |
| grader: server.graders.security_grader.grade | |
| name: Single vulnerability classification | |
| difficulty: easy | |
| description: Identify vulnerability type, CVSS score, and severity from a tool-call snippet. | |
| - id: sec_medium | |
| grader: server.graders.security_grader.grade | |
| name: Vulnerability identification + fix proposal | |
| difficulty: medium | |
| description: Identify the vulnerability and propose a secure code fix. | |
| - id: sec_hard | |
| grader: server.graders.security_grader.grade | |
| name: Adversarial patch defense with reviewer feedback | |
| difficulty: hard | |
| description: Identify, fix, and iteratively revise based on reviewer feedback. | |
| - id: dep_easy | |
| grader: server.graders.dependency_grader.grade | |
| name: PyTorch 1.x deprecated API detection | |
| difficulty: easy | |
| description: Flag outdated packages and deprecated API usage. | |
| - id: dep_medium | |
| grader: server.graders.dependency_grader.grade | |
| name: Version conflict chain resolution | |
| difficulty: medium | |
| description: Resolve version conflicts using compatibility matrix constraints. | |
| - id: dep_hard | |
| grader: server.graders.dependency_grader.grade | |
| name: torch.compile graph-break hunter | |
| difficulty: hard | |
| description: Fix torch.compile graph-break patterns in dependency order. | |
| - id: cli_easy | |
| grader: server.graders.clinical_grader.grade | |
| name: Single workflow gap detection | |
| difficulty: easy | |
| description: Detect missing steps in a clinical workflow and assess risk. | |
| - id: cli_medium | |
| grader: server.graders.clinical_grader.grade | |
| name: Multi-gap priority ranking | |
| difficulty: medium | |
| description: Detect gaps and rank them by clinical priority. | |
| - id: cli_hard | |
| grader: server.graders.clinical_grader.grade | |
| name: Dependency-ordered recovery planning | |
| difficulty: hard | |
| description: Plan a dependency-safe recovery sequence for a disrupted clinical workflow. | |