replicalab / docs /ayush /task_list.md
maxxie114's picture
Initial HF Spaces deployment
80d8c84

Person B (Ayush) Task List

Source of truth: ReplicaLab_Comprehensive_Task_Division.md


Current status

  • All Ayush-owned implementation tasks are now complete.
  • TST 09 is now complete after the fresh-runtime smoke checklist was both written and exercised against the live ART/OpenEnv path.
  • The active training bottleneck is no longer missing infrastructure in Ayush's lane; it is model quality.
  • The current live Scientist ART checkpoint (step6) still underperforms the deterministic baseline on held-out comparison, so the next gains will come from better data, curriculum, reward shaping, and policy tuning rather than missing plumbing.

Epic E02. Domain Models

  • MOD 09 | Add output parser that maps model text to ScientistAction | 0.75h | Depends: MOD 01 | Status: completed on 2026-03-08

Epic E03. Scenario Engine

  • SCN 11 | Create hand checked golden scenarios for prompt testing | 0.75h | Depends: SCN 09 | Status: completed on 2026-03-08

Epic E04. Scientist Agent and Lab Manager Policy

  • AGT 01 | Draft domain-neutral system prompt for Scientist role from normalized scenario data | 0.75h | Depends: MOD 01, SCN 11 | Status: completed on 2026-03-08
  • AGT 02 | Build observation to prompt formatting helper from normalized scenario-derived observations | 0.75h | Depends: AGT 01, MOD 03 | Status: completed on 2026-03-08
  • AGT 03 | Add parse plus retry strategy for malformed model output | 0.75h | Depends: MOD 09, AGT 02 | Status: completed on 2026-03-07
  • AGT 04 | Build baseline heuristic Scientist for non trained smoke tests | 1h | Depends: AGT 02 | Status: completed on 2026-03-08
  • AGT 05 | Implement deterministic feasibility checker over normalized constraints and resources (shared with Person A) | 1.25h | Depends: SCN 07, MOD 05 | Status: completed on 2026-03-08
  • AGT 06 | Implement alternative suggestion logic from allowed substitutions and tradeoffs | 1h | Depends: AGT 05, SCN 08 | Status: completed on 2026-03-08
  • AGT 07 | Add model-backed Lab Manager response synthesis from checker output | 0.75h | Depends: AGT 05 | Status: completed on 2026-03-08
  • AGT 08 | Add prompt formatting and parse tests | 0.75h | Depends: AGT 01 to AGT 04 | Status: completed on 2026-03-07
  • AGT 10 | Write domain-neutral prompt text files for all three roles | 0.75h | Depends: AGT 01, AGT 07, JDG 06 | Status: completed on 2026-03-08
  • AGT 11 | Select and document base model for Scientist training | 0.5h | Depends: AGT 01 | Status: completed on 2026-03-08

Epic E05. Judge Engine and Reward

  • JDG 10 | Expose component metrics for training plots | 0.5h | Depends: JDG 05, JDG 07 | Status: completed on 2026-03-08

Epic E08. RL Training Pipeline

  • TRN 01 | Create notebook skeleton | 0.5h | Depends: API 10 | Status: completed on 2026-03-08
  • TRN 02 | Add package install and model setup cell | 0.75h | Depends: TRN 01 | Status: completed on 2026-03-08
  • TRN 03 | Implement environment client wrapper | 1h | Depends: API 06 | Status: completed on 2026-03-08
  • TRN 04 | Implement rollout collection loop | 1h | Depends: TRN 03, AGT 01 | Status: completed on 2026-03-08
  • TRN 05 | Connect rollouts to GRPO or equivalent trainer | 1.25h | Depends: TRN 04 | Status: completed on 2026-03-08
  • TRN 06 | Log episode reward, rigor, feasibility, fidelity, rounds | 0.75h | Depends: JDG 10, TRN 04 | Status: completed on 2026-03-08
  • TRN 07 | Plot reward curve and component curves | 0.5h | Depends: TRN 06 | Status: completed on 2026-03-08
  • TRN 08 | Add before versus after evaluation on fixed seeds | 1h | Depends: SCN 11, TRN 05 | Status: completed on 2026-03-08
  • TRN 09 | Add policy loading path for trained adapter | 0.5h | Depends: TRN 05 | Status: completed on 2026-03-08
  • TRN 10 | Export plot image and sample logs to outputs/plots | 0.25h | Depends: TRN 07 | Status: completed on 2026-03-08
  • TRN 13 | Create reusable environment client module (client.py) | 1h | Depends: API 06 | Status: completed on 2026-03-08
  • TRN 14 | Select and document base model (notebook side) | 0.5h | Depends: TRN 01 | Status: completed on 2026-03-08 | Assumption now iterated to: Qwen3.5-9B primary, Qwen3.5-4B fallback, Qwen3.5-122B-A10B audit-only judge candidate
  • TRN 15 | Add agreement rate and invalid action rate aggregation | 0.5h | Depends: TRN 06, TRN 08, OBS 09 | Status: completed on 2026-03-08

Epic E10. Logging and Observability

  • OBS 06 | Log training run metadata | 0.5h | Depends: TRN 06 | Status: completed on 2026-03-08

Epic E11. Testing

  • TST 09 | Create notebook smoke test for fresh runtime | 0.5h | Depends: TRN 12 | Status: completed on 2026-03-08 after executing the smoke checklist against the live ART/OpenEnv path

Shared Tasks

  • FND 08 | Freeze JSON contract for actions and observations (with Person A) | 0.75h | Depends: FND 04 | Status: completed and signed off

Totals

Metric Value
Total tasks 29
Completed 29
Remaining 0
Total estimated hours 0h