Spaces:
Running
Running
Tests Map - tests/
365 tests across 18 files. All passing.
Last verified: 2026-03-08
Summary
| File | Tests | What it covers |
|---|---|---|
test_api_rest_isolation.py |
11 | API 14 REST session isolation and replay separation |
test_cache.py |
2 | Oracle scenario caching and reuse |
test_client.py |
24 | TRN 13 reusable client over REST and WebSocket |
test_config.py |
3 | Shared constants and config consistency |
test_env.py |
56 | ENV 01-08, ENV 10, ENV 11, OBS 04, JDG 04-05, TST 01-03 |
test_judge_policy.py |
10 | JDG 11 structured judge audit payload |
test_lab_manager_policy.py |
37 | AGT 05-07 plus AGT 09 determinism coverage |
test_models.py |
21 | Action, observation, step, state, and log contracts |
test_logging.py |
11 | MOD 07 replay persistence and JDG 07 CSV logging helpers |
test_oracle.py |
5 | Oracle hybrid wrapper, structured parsing, and env reset adapter |
test_prompts.py |
7 | AGT 10 prompt files and Oracle prompt asset loading |
test_reward.py |
40 | JDG 01-06, JDG 08, and reward regression coverage |
test_rollout.py |
12 | TRN 03 rollout worker behavior |
test_rollout_traces.py |
2 | TRN 04 bounded tool trace aggregation and batched collection |
test_scenarios.py |
14 | SCN 01-13 scenario generation, determinism, and Oracle scenario adaptation |
test_scientist_policy.py |
46 | MOD 09, AGT 01-04, AGT 08 |
test_server.py |
44 | API 01-04, API 06-08, API 13-14, replay audit propagation, and root landing page |
test_validation.py |
20 | MOD 05-06 semantic validation |
| Total | 365 |
Coverage Notes
- The environment stack is covered end to end:
test_env.pyvalidates reset, step, invalid action, termination, reward integration, deep state snapshots, close/reopen lifecycle behavior, terminal judge-audit propagation, and seeded replay determinism across all scenario families.
- The API/server stack is covered end to end:
test_server.pycovers REST reset/step/scenarios, WebSocket session handling, idle timeout cleanup, CORS behavior, and replay audit propagation.
- The scientist stack is covered end to end:
test_scientist_policy.py,test_prompts.py,test_rollout.py, andtest_rollout_traces.pytogether cover prompt construction, observation formatting, parse/retry, baseline policy, rollout collection, and bounded tool trace capture.
- The judge stack is covered end to end:
test_reward.pycovers rubric scores and reward math, whiletest_judge_policy.pycovers structured audit payload generation.
- The Oracle hybrid layer is covered additively:
test_oracle.py,test_cache.py, andtest_prompts.pycover Oracle scenario generation wrappers, cache reuse, and prompt asset loading without changing the deterministic reward contract.
Remaining Gaps
| Planned test work | Why it still matters |
|---|---|
TST 09 notebook smoke coverage |
Fresh-runtime validation for the judged training notebook |
Task-to-Test Mapping
| Area | Primary test files |
|---|---|
| Models and contracts | test_models.py, test_validation.py |
| Scenarios | test_scenarios.py |
| Oracle integration and cache | test_oracle.py, test_cache.py, test_prompts.py |
| Scientist policy | test_scientist_policy.py, test_prompts.py |
| Lab Manager policy | test_lab_manager_policy.py |
| Judge and reward | test_reward.py, test_judge_policy.py |
| Environment | test_env.py |
| API and deployment-facing server behavior | test_server.py |
| Client and training rollouts | test_client.py, test_rollout.py, test_rollout_traces.py |