phase8_rl / _claude_memory /feedback_hf_before_delete.md
explcre's picture
Upload _claude_memory/feedback_hf_before_delete.md with huggingface_hub
9d820ad verified
metadata
name: HF before delete
description: >-
  Always upload valuable data (ckpts, logs, predictions) to Hugging Face BEFORE
  deleting from local disk. Don't lose data to disk pressure.
type: feedback
originSessionId: 4037f43b-2133-46c6-84bd-02f7d454ec8b

Rule: Before running rm / git rm / quarantine-delete on any artifact under /workspace/dnathinker/ or /shm/dnathinker_quarantine/, first upload to HF if it isn't already mirrored.

Why: User direction (2026-05-05) after I deleted Phase-8 RL through-MDLM log.jsonl files from /shm/dnathinker_quarantine/ during cycle 50 cleanup. Those logs were the ONLY source for the F6_rl_training_curves.pdf paper figure. They were not on HF; deletion was irreversible. Result: F6 panel can't be regenerated and we lost reproducibility.

How to apply:

  1. Before any rm on /workspace/dnathinker/runs/ or /shm/dnathinker_quarantine/, check MANIFEST.tsv for an HF: <repo>/<path> annotation.
  2. If absent, run scripts/innovations/hf_auto_uploader.py --once (or specifically hf_upload_finished_models.py for ckpts) and verify upload succeeded BEFORE deleting.
  3. For run-dir cleanup: keep log.jsonl, manifest.json, train.log, eval_*.log, and any *_score*.json/md — they're tiny and used to regenerate paper figures. ckpts can be deleted if HF-mirrored.
  4. Annotate the deletion in /shm/dnathinker_quarantine/MANIFEST.tsv with the HF reference.

Exceptions: tmp files (/tmp/*, _bench_logs rotations explicitly marked stale, intermediate dataloader caches that are fast to regenerate) don't need HF round-trip.