metadata
name: HF before delete
description: >-
Always upload valuable data (ckpts, logs, predictions) to Hugging Face BEFORE
deleting from local disk. Don't lose data to disk pressure.
type: feedback
originSessionId: 4037f43b-2133-46c6-84bd-02f7d454ec8b
Rule: Before running rm / git rm / quarantine-delete on any artifact under /workspace/dnathinker/ or /shm/dnathinker_quarantine/, first upload to HF if it isn't already mirrored.
Why: User direction (2026-05-05) after I deleted Phase-8 RL through-MDLM log.jsonl files from /shm/dnathinker_quarantine/ during cycle 50 cleanup. Those logs were the ONLY source for the F6_rl_training_curves.pdf paper figure. They were not on HF; deletion was irreversible. Result: F6 panel can't be regenerated and we lost reproducibility.
How to apply:
- Before any
rmon/workspace/dnathinker/runs/or/shm/dnathinker_quarantine/, checkMANIFEST.tsvfor anHF: <repo>/<path>annotation. - If absent, run
scripts/innovations/hf_auto_uploader.py --once(or specificallyhf_upload_finished_models.pyfor ckpts) and verify upload succeeded BEFORE deleting. - For run-dir cleanup: keep
log.jsonl,manifest.json,train.log,eval_*.log, and any*_score*.json/md— they're tiny and used to regenerate paper figures. ckpts can be deleted if HF-mirrored. - Annotate the deletion in
/shm/dnathinker_quarantine/MANIFEST.tsvwith the HF reference.
Exceptions: tmp files (/tmp/*, _bench_logs rotations explicitly marked stale, intermediate dataloader caches that are fast to regenerate) don't need HF round-trip.