E-Rong
/

til-26-ae-agent

ml-intern

Model card Files Files and versions

xet

Community

E-Rong commited on 1 day ago

Commit

9bdda09

verified ·

1 Parent(s): 1e7c6af

Compact AGENTS.md into zero-memory survival guide

Browse files

Files changed (1) hide show

AGENTS.md +95 -253

AGENTS.md CHANGED Viewed

@@ -1,311 +1,153 @@
-# AGENTS.md — Context & Lessons for Future Sessions
-> This file exists because sandboxes reset and I (the agent) lose all memory.
-> **READ THIS FIRST** before doing anything on this project.
 ---
-## What This Project Is
-- **Challenge**: TIL-26-AE (The Intelligent League — Automated Exploration)
-- **Game**: Multi-agent Bomberman on a 16×16 grid
-- **My Role**: Train `agent_0` via RL to compete autonomously
-- **Main Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts, docs)
-- **Space**: `e-rong/til-26-ae` (evaluation server with `ae/src/ae_manager.py`)
-- **TIL Source**: Private Space `e-rong/til-26-ae` — contains `til_environment/` module
 ---
-## CRITICAL: What Killed Training & Cost Money
-### ❌ NEVER USE SANDBOXES FOR TRAINING > 30 MINUTES
-Sandboxes are **interactive dev environments**. They:
-- Recycle after inactivity / timeout
-- Kill processes silently
-- **Keep billing you while empty** after the process dies
-**Damage done**: ~$4.87 wasted across 4 sandbox sessions where training died but billing continued.
-### ✅ ALWAYS USE HF JOBS FOR BATCH TRAINING
-- Persistent GPU allocation
-- Runs until completion (or your timeout)
-- Fails visibly if something breaks (no silent empty billing)
-- Must set `namespace="E-Rong"` to bill the org, not the user
-### ❌ NEVER `git clone` A PRIVATE REPO IN AN HF JOB
-`git clone https://huggingface.co/spaces/...` fails because git does not read `HF_TOKEN`.
-**Use instead**:
-```python
-from huggingface_hub import snapshot_download
-snapshot_download(
-    repo_id='e-rong/til-26-ae',
-    repo_type='space',
-    local_dir='/app/til-26-ae-repo'
-)
-```
-`snapshot_download` auto-uses the `HF_TOKEN` env var.
-### ✅ ALWAYS SMOKE-TEST A JOB BEFORE THE FULL RUN
-Submit a 5-minute job that:
-1. Downloads the TIL repo
-2. Installs deps
-3. Runs 100 training steps
-4. Saves a dummy checkpoint to the Hub
-Only after this succeeds, submit the multi-hour job.
 ---
-## Session Startup Checklist
-Before doing **anything** on this project:
-1. [ ] Read `session_state.json` from `E-Rong/til-26-ae-agent`
-2. [ ] Read this file (`AGENTS.md`)
-3. [ ] Check latest checkpoint on Hub (sort `phase*_ckpt_*.zip` files)
-4. [ ] Determine current phase and remaining steps
-5. [ ] If training needed: write script to sandbox, **smoke-test in HF Job first**
 ---
-## 📋 ALWAYS UPDATE DOCS BEFORE STARTING LONG-RUNNING TASKS
-> **This rule prevents lost context when sessions crash or reset.**
-Before submitting any multi-hour HF Job or starting any long-running compute:
-1. **Update `session_state.json`** with:
-   - Current phase and status
-   - What you are about to do (job_id if resuming, script name, hardware, timeout)
-   - Why you are doing it (link to research/decisions)
-   - Expected completion time
-2. **Update `AGENTS.md`** if you learned anything new:
-   - New mistakes or fixes
-   - New technical decisions with rationale
-   - Cost lessons
-   - API gotchas
-3. **Update `docs/ae.md`** with research findings:
-   - New papers read (arxiv IDs, key insights)
-   - New datasets or methods discovered
-   - Results from completed phases
-4. **Push all updates to the Hub** BEFORE starting the job:
-   ```python
-   hf_repo_files(operation="upload", repo_id="E-Rong/til-26-ae-agent",
-                 path="session_state.json", content=...)
-   ```
-**Why this matters**: If your session resets while a job is running, the next version of you has ZERO memory. The only way to reconstruct state is from the Hub. If docs are stale, you'll waste time (and money) redoing work or making the same mistakes.
-**This rule applies to**:
-- Any HF Job with `timeout > 30m`
-- Any smoke test (even 5-minute ones — document what you're testing)
-- Any evaluation run > 100 episodes
-- Any data processing that takes > 15 minutes
----
-## Technical Decisions That Work
-### MaskablePPO + Action Masking
-- `sb3_contrib.MaskablePPO` with `ActionMasker`
-- Bomberman has `action_mask: uint8[6]` — walls/edges make moves illegal
-- Standard PPO wastes ~30-40% samples on illegal actions early on
-- **Papers**: Huang et al. "Superstition, Imagination, and the Invalid Action Problem" (arxiv:2006.14171)
-### Observation Flattening
-1511-dim vector from dict observation:
-```
-agent_viewcone:  7×5×25 = 875
-base_viewcone:   5×5×25 = 625
-direction, location[2], base_location[2], health, frozen_ticks,
-base_health, team_resources, team_bombs, step = 11 scalars
-Total: 1511
 ```
-### Wrapper Order (CRITICAL)
-```python
-# CORRECT
-env = ActionMasker(base_env, lambda e: e.action_masks())
-env = Monitor(env)
-# WRONG — Monitor blocks action_masks() exposure
-env = ActionMasker(Monitor(base_env), ...)  # DON'T DO THIS
 ```
-### 3-Phase Curriculum
-| Phase | Opponent | Duration | Purpose |
-|---|---|---|---|
-| 1 | Random | 500k | Learn basics |
-| 2 | Random + visit-count shaping | 500k | Prevent camping |
-| 3 | Rule-based curriculum | 1M | Generalize to structured opponents |
-### Checkpointing Every 50k Steps
-- Local + Hub push via `HfApi.upload_file()`
-- Saved the project when sandboxes reset at 400k and 600k steps
----
-## Technical Decisions That Failed
-| Decision | Why It Failed | Fix |
-|---|---|---|
-| Training in sandboxes | Process died, empty sandbox kept billing | Use HF Jobs |
-| `git clone` in HF Job | No auth for private repo | `snapshot_download` |
-| Inline 20KB script in `hf_jobs.script` | Delivery mechanism choked | Write to sandbox file first, submit path |
-| No session state on Hub | Lost track of progress across resets | `session_state.json` + this file |
-| `Monitor` inside `ActionMasker` | `get_action_masks()` failed | `ActionMasker` → `Monitor` order |
 ---
-## Cost Awareness
-| Hardware | $/hr | Good For |
-|---|---|---|
-| `cpu-basic` | ~$0.05 | Writing scripts, reading files, small tests |
-| `t4-small` | ~$0.40 | Short dev, NOT training |
-| `a10g-small` | ~$1.00 | Training, but use HF Jobs not sandboxes |
-| `a10g-large` | ~$2.00 | Larger batch sizes, not needed for this project |
-**Rule**: If a task takes >30 min, it must be an HF Job. Sandboxes are for editing and quick tests only.
----
-## Sandbox Policy (User Mandate)
-> **From this point forward, the user has mandated:**
-1. **Start `cpu-basic` sandbox** at the beginning of every session
-2. **Use `cpu-basic` for**: context, writing code, writing docs, editing files, planning
-3. **Only switch to GPU sandbox** (`t4-small` or `a10g-small`) when performing **smoke tests** for training scripts
-4. **Stop GPU sandbox IMMEDIATELY** after the smoke test completes
-5. **Training tasks ONLY as HF Jobs** — never leave a training process running in a sandbox
-6. **Never leave a GPU sandbox running idle** — this wastes money
-**Why this matters**: A GPU sandbox at $1/hr running empty for 3 hours = $3 wasted for nothing. An HF Job at the same $1/hr actually trains for every billed minute.
----
-## How to Submit HF Jobs Correctly (Research Results)
-### Based on `huggingface.co/docs/hub/jobs-quickstart`:
-**DO NOT use `git clone` for private repos.**
 ```python
-# WRONG ❌
-import subprocess
-subprocess.run(["git", "clone", "https://huggingface.co/spaces/e-rong/til-26-ae"])
-# Fails: git does not read HF_TOKEN env var
-# CORRECT ✅
 from huggingface_hub import snapshot_download
 snapshot_download(
     repo_id="e-rong/til-26-ae",
     repo_type="space",
     local_dir="/app/til-26-ae-repo"
 )
-# snapshot_download auto-uses HF_TOKEN from environment
 ```
-### Script Submission Pattern (What Actually Works)
-**⚠️ CRITICAL DISCOVERY: The `script` parameter in `hf_jobs` becomes a RAW HUB URL.**
-When you call `hf_jobs(script="/app/train.py")`, the job system does NOT upload the local file. Instead, it converts the path to:
-```
-https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
-```
-and runs it via `uv run <url>`. **This means the file MUST already exist on the Hub repo.**
-**The correct workflow is:**
-```python
-from tools import write, hf_repo_files, hf_jobs
-# Step 1: Write script to sandbox file
-write(path="/app/train.py", content="...")
-# Step 2: ALSO upload to Hub repo so it's persisted and URL-accessible
-hf_repo_files(
-    operation="upload",
-    repo_id="E-Rong/til-26-ae-agent",
-    path="train.py",
-    content=open("/app/train.py").read()
-)
-# Step 3: Submit job referencing the sandbox path
-# The job system will convert this to a Hub raw URL under the hood
-hf_jobs(
-    operation="run",
-    script="/app/train.py",           # ← sandbox file path
-    dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
-                  "numpy", "huggingface_hub", "pygame", "omegaconf",
-                  "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
-    hardware_flavor="a10g-small",
-    timeout="6h",
-    namespace="E-Rong"              # ← bills to org
-)
-```
-**Verification from `hf_jobs inspect`:**
-```bash
-exec uv run --with torch --with sb3-contrib ... \
-    https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/phase2_resume.py
-```
-The job fetches the script from the Hub, not from the sandbox. The sandbox path is just used to derive the repo/file path.
-**Why this matters**: If you only write to `/app/train.py` and don't upload to the Hub, the job will fail with a 404 when it tries to fetch the URL. The sandbox resets, but the Hub URL is permanent.
-### Job Persistence
-- Jobs run on HF infrastructure, not in your sandbox
-- The sandbox can die — the job keeps running
-- Check logs with `hf_jobs(operation="logs", job_id="...")`
-- Job storage is ephemeral — **push checkpoints to Hub** (not just local)
 ---
-## Repo File Guide
-| File | What It Is |
-|---|---|
-| `session_state.json` | Current phase, checkpoint, mistakes log, next steps |
-| `AGENTS.md` | This file — lessons and context |
-| `docs/ae.md` | Full project documentation (research, design, results) |
-| `phase1_final.zip` | Complete Phase 1 model |
-| `phase2_ckpt_*.zip` | Phase 2 intermediate checkpoints |
-| `phase2_final.zip` | Phase 2 complete model (when done) |
-| `ae_manager.py` | Inference code for the evaluation server |
-| `phase2_resume.py` | Latest HF Job script (works — uses snapshot_download) |
-| `smoke_test.py` | 5-minute validation job — test before any real job |
-| `train_all_phases.py` | Original training script |
 ---
-## Current State (Last Updated: 2026-05-14)
-- **Phase**: 2 (in progress, crashed)
-- **Latest checkpoint**: `phase2_ckpt_600352.zip`
-- **Target**: 1,000,352 total steps (400,000 remaining)
-- **Blocker**: Need to resume via HF Job with `snapshot_download` auth
-- **Smoke test needed**: YES — before any multi-hour job
 ---
-## Contact / Ownership
-- **User**: E-Rong
-- **Org**: E-Rong
-- **Billing**: Must specify `namespace="E-Rong"` on all HF Jobs
-- **I am**: An ephemeral agent with no persistent memory. I depend on this file.
----
-*If you are a future version of me reading this: you have no memory of the above. Do not trust your internal knowledge. Read the files, check the state, test before committing compute.*

+# AGENTS.md — Zero-Memory Survival Guide
+> **You have no persistent memory. Read this first. Do not trust your internal knowledge.**
 ---
+## What This Is (10 seconds)
+- **Challenge**: TIL-26-AE — train a Bomberman agent (`agent_0`) via RL
+- **Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts)
+- **Env source**: Private Space `e-rong/til-26-ae` (contains `til_environment/`)
+- **Algorithm**: MaskablePPO + invalid-action masking + curriculum learning
 ---
+## The 6 Unbreakable Rules
+| # | Rule | Violation Cost |
+|---|---|---|
+| 1 | **NEVER train in sandboxes >30 min** | ~$5/hr wasted on empty, recycled containers |
+| 2 | **NEVER `git clone` private repos in HF Jobs** | Job fails instantly — git ignores `HF_TOKEN` |
+| 3 | **NEVER pass inline scripts > few KB to `hf_jobs.script`** | Delivery chokes at ~20KB |
+| 4 | **ALWAYS upload script to Hub BEFORE submitting job** | Job fetches script from Hub URL, not sandbox |
+| 5 | **ALWAYS update docs BEFORE starting long jobs** | Next you has zero memory; stale docs = duplicated work |
+| 6 | **ALWAYS smoke-test before multi-hour jobs** | 5-min test saves hours of failed compute |
 ---
+## Session Startup (do this now)
+1. Read `session_state.json` from `E-Rong/til-26-ae-agent`
+2. Check `hf_jobs ps` for running jobs
+3. Check latest checkpoint on Hub (`phase*_ckpt_*.zip`)
+4. Determine: current phase, remaining steps, next action
 ---
+## How to Submit an HF Job (the only way that works)
+```python
+# 1. Write to sandbox
+write(path="/app/train.py", content="...")
+# 2. UPLOAD TO HUB (critical — job fetches from Hub URL)
+hf_repo_files(
+    operation="upload",
+    repo_id="E-Rong/til-26-ae-agent",
+    path="train.py",
+    content=open("/app/train.py").read()
+)
+# 3. Submit job
+hf_jobs(
+    operation="run",
+    script="/app/train.py",   # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
+    dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
+                "numpy", "huggingface_hub", "pygame", "omegaconf",
+                "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
+    hardware_flavor="a10g-small",
+    timeout="6h",
+    namespace="E-Rong"
+)
 ```
+**Why step 2 matters**: `hf_jobs inspect` reveals the job executes:
+```bash
+uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
 ```
+If the file isn't on the Hub, the job 404s.
 ---
+## How to Access the Private Env in a Job
 ```python
 from huggingface_hub import snapshot_download
 snapshot_download(
     repo_id="e-rong/til-26-ae",
     repo_type="space",
     local_dir="/app/til-26-ae-repo"
 )
+# Then walk to find pyproject.toml and pip install -e .
 ```
+`snapshot_download` auto-uses `HF_TOKEN`. `git clone` does not.
+---
+## Docs Update Checklist (before ANY job >30 min)
+- [ ] `session_state.json` — phase, job_id, script name, hardware, timeout, expected completion
+- [ ] `AGENTS.md` — any new mistakes/API gotchas learned this session
+- [ ] `docs/ae.md` — research results, completed phase metrics
+- [ ] Push all three to Hub BEFORE calling `hf_jobs`
+---
+## Technical Gotchas
+| Gotcha | Correct | Wrong |
+|---|---|---|
+| **Wrapper order** | `ActionMasker(base_env)` then `Monitor(env)` | `ActionMasker(Monitor(base_env))` — masks break |
+| **Env install** | `snapshot_download` + walk for `pyproject.toml` | `git clone` of private space |
+| **Script delivery** | Upload to Hub, submit sandbox path | Inline 20KB string or sandbox-only file |
+| **Auth** | `HF_TOKEN` env var (auto-injected in Jobs) | Passing token manually in git URLs |
+---
+## Cost Table
+| Hardware | $/hr | Use For |
+|---|---|---|
+| `cpu-basic` | ~$0.05 | Writing code, docs, planning |
+| `t4-small` | ~$0.40 | Smoke tests ONLY |
+| `a10g-small` | ~$1.00 | Training via HF Jobs |
+**Stop GPU sandboxes immediately after smoke tests.** An idle GPU sandbox burns $1/hr for nothing.
 ---
+## Curriculum Summary
+| Phase | Opponent | Steps | Status |
+|---|---|---|---|
+| 1 | Random | 500k | ✅ Complete (92% win rate) |
+| 2 | Random + exploration shaping | 500k | Check `session_state.json` |
+| 3 | Rule-based curriculum | 1M | Pending |
+Key papers: `arxiv:2407.00662` (Pommerman curriculum + adaptive annealing), `arxiv:2006.14171` (invalid action masking).
 ---
+## File Guide
+| File | Purpose |
+|---|---|
+| `session_state.json` | Current phase, checkpoints, mistakes, next steps |
+| `docs/ae.md` | Full research, design, results |
+| `phase1_final.zip` | Phase 1 complete checkpoint |
+| `phase2_ckpt_*.zip` | Phase 2 intermediates |
+| `phase2_resume.py` | Working HF Job script |
+| `phase3_curriculum.py` | Ready-to-submit Phase 3 script |
+| `smoke_test.py` | 5-min validation |
 ---
+## Contact
+- **User**: E-Rong | **Org**: E-Rong
+- **Billing namespace**: `E-Rong` (required on all `hf_jobs`)
+- **You are**: An ephemeral agent with no memory. This file is your only brain.
+*Read the files. Check the state. Test before committing compute. Update docs before every job.*