Spaces:

openenv-community
/

replicalab

Running

Initial HF Spaces deployment

80d8c84 2 days ago

3.77 kB

Tests Map - `tests/`

365 tests across 18 files. All passing.

Last verified: 2026-03-08

File	Tests	What it covers
`test_api_rest_isolation.py`	11	`API 14` REST session isolation and replay separation
`test_cache.py`	2	Oracle scenario caching and reuse
`test_client.py`	24	`TRN 13` reusable client over REST and WebSocket
`test_config.py`	3	Shared constants and config consistency
`test_env.py`	56	`ENV 01-08`, `ENV 10`, `ENV 11`, `OBS 04`, `JDG 04-05`, `TST 01-03`
`test_judge_policy.py`	10	`JDG 11` structured judge audit payload
`test_lab_manager_policy.py`	37	`AGT 05-07` plus `AGT 09` determinism coverage
`test_models.py`	21	Action, observation, step, state, and log contracts
`test_logging.py`	11	`MOD 07` replay persistence and `JDG 07` CSV logging helpers
`test_oracle.py`	5	Oracle hybrid wrapper, structured parsing, and env reset adapter
`test_prompts.py`	7	`AGT 10` prompt files and Oracle prompt asset loading
`test_reward.py`	40	`JDG 01-06`, `JDG 08`, and reward regression coverage
`test_rollout.py`	12	`TRN 03` rollout worker behavior
`test_rollout_traces.py`	2	`TRN 04` bounded tool trace aggregation and batched collection
`test_scenarios.py`	14	`SCN 01-13` scenario generation, determinism, and Oracle scenario adaptation
`test_scientist_policy.py`	46	`MOD 09`, `AGT 01-04`, `AGT 08`
`test_server.py`	44	`API 01-04`, `API 06-08`, `API 13-14`, replay audit propagation, and root landing page
`test_validation.py`	20	`MOD 05-06` semantic validation
Total	365

The environment stack is covered end to end:
- test_env.py validates reset, step, invalid action, termination, reward integration, deep state snapshots, close/reopen lifecycle behavior, terminal judge-audit propagation, and seeded replay determinism across all scenario families.
The API/server stack is covered end to end:
- test_server.py covers REST reset/step/scenarios, WebSocket session handling, idle timeout cleanup, CORS behavior, and replay audit propagation.
The scientist stack is covered end to end:
- test_scientist_policy.py, test_prompts.py, test_rollout.py, and test_rollout_traces.py together cover prompt construction, observation formatting, parse/retry, baseline policy, rollout collection, and bounded tool trace capture.
The judge stack is covered end to end:
- test_reward.py covers rubric scores and reward math, while test_judge_policy.py covers structured audit payload generation.
The Oracle hybrid layer is covered additively:
- test_oracle.py, test_cache.py, and test_prompts.py cover Oracle scenario generation wrappers, cache reuse, and prompt asset loading without changing the deterministic reward contract.

Planned test work	Why it still matters
`TST 09` notebook smoke coverage	Fresh-runtime validation for the judged training notebook

Area	Primary test files
Models and contracts	`test_models.py`, `test_validation.py`
Scenarios	`test_scenarios.py`
Oracle integration and cache	`test_oracle.py`, `test_cache.py`, `test_prompts.py`
Scientist policy	`test_scientist_policy.py`, `test_prompts.py`
Lab Manager policy	`test_lab_manager_policy.py`
Judge and reward	`test_reward.py`, `test_judge_policy.py`
Environment	`test_env.py`
API and deployment-facing server behavior	`test_server.py`
Client and training rollouts	`test_client.py`, `test_rollout.py`, `test_rollout_traces.py`