Clark β Foundation RL Model for Warehouse Workforce Scheduling
Clark is a transformer + LSTM PPO agent that schedules a warehouse workforce. It pre-trains once on thousands of synthetic facilities, then fine-tunes to a specific facility in 50 episodes (3 h on a consumer GPU). One foundation model, many facilities β variable numbers of workers and tasks, no per-site retrain from scratch.
- Code, docs, CLI, setup wizard, operations dashboard: https://github.com/jarmstrong158/Clark
- Architecture (
clark-v2.5, ~18M params): per step, workers and tasks are tokenized separately; workers self-attend, then cross-attend to tasks; an LSTM carries state across the simulated year; per-worker assignment + hustle heads sample under action masks. Trained with PPO using per-worker importance-sampling ratios (IPPO-style), symlog value targets (DreamerV3), and a completion-dominant order reward. - This checkpoint: refinement level v2.11 β carries the full v2.5βv2.11 chain (multi-gate filler mask, restock/OT/management tuning, management-backlog observation, and the task-flow ramp + minimum-dwell cadence work that makes workers hold tasks in realistic ~30β60-min blocks).
Intended use
Generate daily shift plans (per-worker, per-10-min task assignments), project a day's outcome (grade + completion distribution), and answer staffing questions for a configured facility. Inputs are out-of-distribution beyond the trained bounds (3β50 workers, 3β15 tasks, see clark/config/clark_limits.yaml); expanding them requires retraining.
Usage
git clone https://github.com/jarmstrong158/Clark.git && cd Clark
pip install -e .
# download this checkpoint to where Clark expects it
hf download Roflimjonny/clark-foundation clark_foundation.pt \
--local-dir clark/data/checkpoints
# serve it (deploy at temperature tau ~ 1.0 β argmax catastrophically underperforms)
clark serve --model clark/data/checkpoints/clark_foundation.pt --facilities-dir clark/data/configs
# or fine-tune to your facility
clark finetune --config my_warehouse.yaml --base clark/data/checkpoints/clark_foundation.pt --episodes 50
Serve-time temperature matters. This policy is trained in a distribution-mixing regime (PPO entropy bonus). Argmax inference drops to ~13% ship-win on stage-3 configs vs ~93% at
tau β 1.0. Deploy attau β 1.0.
Performance
Validated on the predecessor "Jack" single-facility setup, translated faithfully to a Clark config and simulated over a full work-year:
| Metric | v2.10 foundation alone (no per-facility training) | + 50 fine-tune episodes |
|---|---|---|
| A-grade days | 57.5% (matches Jack-from-scratch's 58%) | 62.1% |
| A + B days | 85.1% | 95.8% |
| F-grade days | 15.0% | 4.2% |
| Per-facility training | none |
v2.11 additionally makes the per-worker schedule realistic β task switching drops from ~29/worker/day (a flip every ~10 min) to ~14 (every ~38 min) via a structural minimum-dwell mask, with grades held (A+B ~90%, 100% completion on the validation facility). Full methodology, the honest F-rate read, and the v2.5βv2.11 iteration log are in the repo README and CHANGELOG.
Held-out generalization
On 20 freshly-sampled stage-3 facilities the model never trained on (hardest tier β up to 50 workers, deliberate-overload days), each a full simulated work-year scored by the in-env production grader: median A+B 97.5% (p10βp90 65.1β100), A 76.5%, F 0.5%, order completion 100%. Reproduce with clark eval --n-per-stage 20 --stages 3.
Vs. a strong classical baseline (honest comparison)
Clark is benchmarked against a deliberately strong rule-based heuristic scheduler (bottleneck-aware dispatching, same action masks, same grader, same 20 held-out facilities). The honest result:
| Metric (median, 20 held-out facilities) | Heuristic scheduler | Clark |
|---|---|---|
| A + B grade days | 98.3% | 97.5% |
| Order completion | 100% | 100% |
| A days (no overtime) | 43.3% | 76.5% |
| F days | 1.7% | 0.5% |
| A+B worst-case (p10) | 56.9 | 65.1 |
A well-engineered heuristic ties Clark on throughput (A+B, completion). Clark's edge is specific: it finishes without overtime ~33 pp more often (less paid OT), has ~3Γ fewer catastrophic days, and generalizes across facilities with zero per-site tuning (the heuristic's constants are hand-fit to one distribution). A separate CP-SAT (constraint-programming) bound confirms throughput is never the binding constraint β the difference is in balancing the soft quality objectives. Write-up: ENGINEERING_NOTES Β§9βΒ§10.
Limitations
- Numbers above are on synthetic facilities + the Jack-translated config; real facilities will vary, and a per-facility fine-tune is recommended.
- A small irreducible F-rate (~2β4%) on the hardest stress configs is by design (days that exceed rescue capacity), not reward-hacked away.
- Trained for the bounds in
clark_limits.yaml; out-of-distribution facilities need retraining.
License
Released under PolyForm Noncommercial 1.0.0 β the same license as the source. Free to download, run, study, evaluate, and fine-tune for any noncommercial purpose (research, personal, educational, journalism). Running Clark in production for a for-profit operation, or selling a product/service built on it, requires a separate commercial agreement β open a commercial-access issue on the GitHub repo.
Citation
@software{armstrong_clark,
author = {Armstrong, Jonathan},
title = {Clark: a foundation reinforcement learning model for warehouse workforce scheduling},
url = {https://github.com/jarmstrong158/Clark}
}