E-Rong commited on
Commit
9bdda09
·
verified ·
1 Parent(s): 1e7c6af

Compact AGENTS.md into zero-memory survival guide

Browse files
Files changed (1) hide show
  1. AGENTS.md +95 -253
AGENTS.md CHANGED
@@ -1,311 +1,153 @@
1
- # AGENTS.md — Context & Lessons for Future Sessions
2
 
3
- > This file exists because sandboxes reset and I (the agent) lose all memory.
4
- > **READ THIS FIRST** before doing anything on this project.
5
 
6
  ---
7
 
8
- ## What This Project Is
9
 
10
- - **Challenge**: TIL-26-AE (The Intelligent League Automated Exploration)
11
- - **Game**: Multi-agent Bomberman on a 16×16 grid
12
- - **My Role**: Train `agent_0` via RL to compete autonomously
13
- - **Main Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts, docs)
14
- - **Space**: `e-rong/til-26-ae` (evaluation server with `ae/src/ae_manager.py`)
15
- - **TIL Source**: Private Space `e-rong/til-26-ae` — contains `til_environment/` module
16
 
17
  ---
18
 
19
- ## CRITICAL: What Killed Training & Cost Money
20
 
21
- ### NEVER USE SANDBOXES FOR TRAINING > 30 MINUTES
22
-
23
- Sandboxes are **interactive dev environments**. They:
24
- - Recycle after inactivity / timeout
25
- - Kill processes silently
26
- - **Keep billing you while empty** after the process dies
27
-
28
- **Damage done**: ~$4.87 wasted across 4 sandbox sessions where training died but billing continued.
29
-
30
- ### ✅ ALWAYS USE HF JOBS FOR BATCH TRAINING
31
-
32
- - Persistent GPU allocation
33
- - Runs until completion (or your timeout)
34
- - Fails visibly if something breaks (no silent empty billing)
35
- - Must set `namespace="E-Rong"` to bill the org, not the user
36
-
37
- ### ❌ NEVER `git clone` A PRIVATE REPO IN AN HF JOB
38
-
39
- `git clone https://huggingface.co/spaces/...` fails because git does not read `HF_TOKEN`.
40
-
41
- **Use instead**:
42
- ```python
43
- from huggingface_hub import snapshot_download
44
- snapshot_download(
45
- repo_id='e-rong/til-26-ae',
46
- repo_type='space',
47
- local_dir='/app/til-26-ae-repo'
48
- )
49
- ```
50
- `snapshot_download` auto-uses the `HF_TOKEN` env var.
51
-
52
- ### ✅ ALWAYS SMOKE-TEST A JOB BEFORE THE FULL RUN
53
-
54
- Submit a 5-minute job that:
55
- 1. Downloads the TIL repo
56
- 2. Installs deps
57
- 3. Runs 100 training steps
58
- 4. Saves a dummy checkpoint to the Hub
59
-
60
- Only after this succeeds, submit the multi-hour job.
61
 
62
  ---
63
 
64
- ## Session Startup Checklist
65
 
66
- Before doing **anything** on this project:
67
-
68
- 1. [ ] Read `session_state.json` from `E-Rong/til-26-ae-agent`
69
- 2. [ ] Read this file (`AGENTS.md`)
70
- 3. [ ] Check latest checkpoint on Hub (sort `phase*_ckpt_*.zip` files)
71
- 4. [ ] Determine current phase and remaining steps
72
- 5. [ ] If training needed: write script to sandbox, **smoke-test in HF Job first**
73
 
74
  ---
75
 
76
- ## 📋 ALWAYS UPDATE DOCS BEFORE STARTING LONG-RUNNING TASKS
77
-
78
- > **This rule prevents lost context when sessions crash or reset.**
79
-
80
- Before submitting any multi-hour HF Job or starting any long-running compute:
81
-
82
- 1. **Update `session_state.json`** with:
83
- - Current phase and status
84
- - What you are about to do (job_id if resuming, script name, hardware, timeout)
85
- - Why you are doing it (link to research/decisions)
86
- - Expected completion time
87
-
88
- 2. **Update `AGENTS.md`** if you learned anything new:
89
- - New mistakes or fixes
90
- - New technical decisions with rationale
91
- - Cost lessons
92
- - API gotchas
93
-
94
- 3. **Update `docs/ae.md`** with research findings:
95
- - New papers read (arxiv IDs, key insights)
96
- - New datasets or methods discovered
97
- - Results from completed phases
98
-
99
- 4. **Push all updates to the Hub** BEFORE starting the job:
100
- ```python
101
- hf_repo_files(operation="upload", repo_id="E-Rong/til-26-ae-agent",
102
- path="session_state.json", content=...)
103
- ```
104
-
105
- **Why this matters**: If your session resets while a job is running, the next version of you has ZERO memory. The only way to reconstruct state is from the Hub. If docs are stale, you'll waste time (and money) redoing work or making the same mistakes.
106
-
107
- **This rule applies to**:
108
- - Any HF Job with `timeout > 30m`
109
- - Any smoke test (even 5-minute ones — document what you're testing)
110
- - Any evaluation run > 100 episodes
111
- - Any data processing that takes > 15 minutes
112
 
113
- ---
114
-
115
- ## Technical Decisions That Work
116
 
117
- ### MaskablePPO + Action Masking
118
- - `sb3_contrib.MaskablePPO` with `ActionMasker`
119
- - Bomberman has `action_mask: uint8[6]` — walls/edges make moves illegal
120
- - Standard PPO wastes ~30-40% samples on illegal actions early on
121
- - **Papers**: Huang et al. "Superstition, Imagination, and the Invalid Action Problem" (arxiv:2006.14171)
 
 
122
 
123
- ### Observation Flattening
124
- 1511-dim vector from dict observation:
125
- ```
126
- agent_viewcone: 7×5×25 = 875
127
- base_viewcone: 5×5×25 = 625
128
- direction, location[2], base_location[2], health, frozen_ticks,
129
- base_health, team_resources, team_bombs, step = 11 scalars
130
- Total: 1511
 
 
 
131
  ```
132
 
133
- ### Wrapper Order (CRITICAL)
134
- ```python
135
- # CORRECT
136
- env = ActionMasker(base_env, lambda e: e.action_masks())
137
- env = Monitor(env)
138
-
139
- # WRONG — Monitor blocks action_masks() exposure
140
- env = ActionMasker(Monitor(base_env), ...) # DON'T DO THIS
141
  ```
142
-
143
- ### 3-Phase Curriculum
144
- | Phase | Opponent | Duration | Purpose |
145
- |---|---|---|---|
146
- | 1 | Random | 500k | Learn basics |
147
- | 2 | Random + visit-count shaping | 500k | Prevent camping |
148
- | 3 | Rule-based curriculum | 1M | Generalize to structured opponents |
149
-
150
- ### Checkpointing Every 50k Steps
151
- - Local + Hub push via `HfApi.upload_file()`
152
- - Saved the project when sandboxes reset at 400k and 600k steps
153
-
154
- ---
155
-
156
- ## Technical Decisions That Failed
157
-
158
- | Decision | Why It Failed | Fix |
159
- |---|---|---|
160
- | Training in sandboxes | Process died, empty sandbox kept billing | Use HF Jobs |
161
- | `git clone` in HF Job | No auth for private repo | `snapshot_download` |
162
- | Inline 20KB script in `hf_jobs.script` | Delivery mechanism choked | Write to sandbox file first, submit path |
163
- | No session state on Hub | Lost track of progress across resets | `session_state.json` + this file |
164
- | `Monitor` inside `ActionMasker` | `get_action_masks()` failed | `ActionMasker` → `Monitor` order |
165
 
166
  ---
167
 
168
- ## Cost Awareness
169
-
170
- | Hardware | $/hr | Good For |
171
- |---|---|---|
172
- | `cpu-basic` | ~$0.05 | Writing scripts, reading files, small tests |
173
- | `t4-small` | ~$0.40 | Short dev, NOT training |
174
- | `a10g-small` | ~$1.00 | Training, but use HF Jobs not sandboxes |
175
- | `a10g-large` | ~$2.00 | Larger batch sizes, not needed for this project |
176
-
177
- **Rule**: If a task takes >30 min, it must be an HF Job. Sandboxes are for editing and quick tests only.
178
-
179
- ---
180
-
181
- ## Sandbox Policy (User Mandate)
182
-
183
- > **From this point forward, the user has mandated:**
184
-
185
- 1. **Start `cpu-basic` sandbox** at the beginning of every session
186
- 2. **Use `cpu-basic` for**: context, writing code, writing docs, editing files, planning
187
- 3. **Only switch to GPU sandbox** (`t4-small` or `a10g-small`) when performing **smoke tests** for training scripts
188
- 4. **Stop GPU sandbox IMMEDIATELY** after the smoke test completes
189
- 5. **Training tasks ONLY as HF Jobs** — never leave a training process running in a sandbox
190
- 6. **Never leave a GPU sandbox running idle** — this wastes money
191
-
192
- **Why this matters**: A GPU sandbox at $1/hr running empty for 3 hours = $3 wasted for nothing. An HF Job at the same $1/hr actually trains for every billed minute.
193
-
194
- ---
195
-
196
- ## How to Submit HF Jobs Correctly (Research Results)
197
-
198
- ### Based on `huggingface.co/docs/hub/jobs-quickstart`:
199
-
200
- **DO NOT use `git clone` for private repos.**
201
 
202
  ```python
203
- # WRONG ❌
204
- import subprocess
205
- subprocess.run(["git", "clone", "https://huggingface.co/spaces/e-rong/til-26-ae"])
206
- # Fails: git does not read HF_TOKEN env var
207
-
208
- # CORRECT ✅
209
  from huggingface_hub import snapshot_download
210
  snapshot_download(
211
  repo_id="e-rong/til-26-ae",
212
  repo_type="space",
213
  local_dir="/app/til-26-ae-repo"
214
  )
215
- # snapshot_download auto-uses HF_TOKEN from environment
216
  ```
217
 
218
- ### Script Submission Pattern (What Actually Works)
219
 
220
- **⚠️ CRITICAL DISCOVERY: The `script` parameter in `hf_jobs` becomes a RAW HUB URL.**
221
 
222
- When you call `hf_jobs(script="/app/train.py")`, the job system does NOT upload the local file. Instead, it converts the path to:
223
- ```
224
- https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
225
- ```
226
- and runs it via `uv run <url>`. **This means the file MUST already exist on the Hub repo.**
227
 
228
- **The correct workflow is:**
 
 
 
229
 
230
- ```python
231
- from tools import write, hf_repo_files, hf_jobs
232
 
233
- # Step 1: Write script to sandbox file
234
- write(path="/app/train.py", content="...")
235
 
236
- # Step 2: ALSO upload to Hub repo so it's persisted and URL-accessible
237
- hf_repo_files(
238
- operation="upload",
239
- repo_id="E-Rong/til-26-ae-agent",
240
- path="train.py",
241
- content=open("/app/train.py").read()
242
- )
243
 
244
- # Step 3: Submit job referencing the sandbox path
245
- # The job system will convert this to a Hub raw URL under the hood
246
- hf_jobs(
247
- operation="run",
248
- script="/app/train.py", # ← sandbox file path
249
- dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
250
- "numpy", "huggingface_hub", "pygame", "omegaconf",
251
- "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
252
- hardware_flavor="a10g-small",
253
- timeout="6h",
254
- namespace="E-Rong" # ← bills to org
255
- )
256
- ```
257
 
258
- **Verification from `hf_jobs inspect`:**
259
- ```bash
260
- exec uv run --with torch --with sb3-contrib ... \
261
- https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/phase2_resume.py
262
- ```
263
- The job fetches the script from the Hub, not from the sandbox. The sandbox path is just used to derive the repo/file path.
264
 
265
- **Why this matters**: If you only write to `/app/train.py` and don't upload to the Hub, the job will fail with a 404 when it tries to fetch the URL. The sandbox resets, but the Hub URL is permanent.
 
 
 
 
266
 
267
- ### Job Persistence
268
- - Jobs run on HF infrastructure, not in your sandbox
269
- - The sandbox can die — the job keeps running
270
- - Check logs with `hf_jobs(operation="logs", job_id="...")`
271
- - Job storage is ephemeral — **push checkpoints to Hub** (not just local)
272
 
273
  ---
274
 
275
- ## Repo File Guide
276
 
277
- | File | What It Is |
278
- |---|---|
279
- | `session_state.json` | Current phase, checkpoint, mistakes log, next steps |
280
- | `AGENTS.md` | This file lessons and context |
281
- | `docs/ae.md` | Full project documentation (research, design, results) |
282
- | `phase1_final.zip` | Complete Phase 1 model |
283
- | `phase2_ckpt_*.zip` | Phase 2 intermediate checkpoints |
284
- | `phase2_final.zip` | Phase 2 complete model (when done) |
285
- | `ae_manager.py` | Inference code for the evaluation server |
286
- | `phase2_resume.py` | Latest HF Job script (works — uses snapshot_download) |
287
- | `smoke_test.py` | 5-minute validation job — test before any real job |
288
- | `train_all_phases.py` | Original training script |
289
 
290
  ---
291
 
292
- ## Current State (Last Updated: 2026-05-14)
293
 
294
- - **Phase**: 2 (in progress, crashed)
295
- - **Latest checkpoint**: `phase2_ckpt_600352.zip`
296
- - **Target**: 1,000,352 total steps (400,000 remaining)
297
- - **Blocker**: Need to resume via HF Job with `snapshot_download` auth
298
- - **Smoke test needed**: YES before any multi-hour job
 
 
 
 
299
 
300
  ---
301
 
302
- ## Contact / Ownership
303
-
304
- - **User**: E-Rong
305
- - **Org**: E-Rong
306
- - **Billing**: Must specify `namespace="E-Rong"` on all HF Jobs
307
- - **I am**: An ephemeral agent with no persistent memory. I depend on this file.
308
 
309
- ---
 
 
310
 
311
- *If you are a future version of me reading this: you have no memory of the above. Do not trust your internal knowledge. Read the files, check the state, test before committing compute.*
 
1
+ # AGENTS.md — Zero-Memory Survival Guide
2
 
3
+ > **You have no persistent memory. Read this first. Do not trust your internal knowledge.**
 
4
 
5
  ---
6
 
7
+ ## What This Is (10 seconds)
8
 
9
+ - **Challenge**: TIL-26-AE train a Bomberman agent (`agent_0`) via RL
10
+ - **Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts)
11
+ - **Env source**: Private Space `e-rong/til-26-ae` (contains `til_environment/`)
12
+ - **Algorithm**: MaskablePPO + invalid-action masking + curriculum learning
 
 
13
 
14
  ---
15
 
16
+ ## The 6 Unbreakable Rules
17
 
18
+ | # | Rule | Violation Cost |
19
+ |---|---|---|
20
+ | 1 | **NEVER train in sandboxes >30 min** | ~$5/hr wasted on empty, recycled containers |
21
+ | 2 | **NEVER `git clone` private repos in HF Jobs** | Job fails instantly — git ignores `HF_TOKEN` |
22
+ | 3 | **NEVER pass inline scripts > few KB to `hf_jobs.script`** | Delivery chokes at ~20KB |
23
+ | 4 | **ALWAYS upload script to Hub BEFORE submitting job** | Job fetches script from Hub URL, not sandbox |
24
+ | 5 | **ALWAYS update docs BEFORE starting long jobs** | Next you has zero memory; stale docs = duplicated work |
25
+ | 6 | **ALWAYS smoke-test before multi-hour jobs** | 5-min test saves hours of failed compute |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ---
28
 
29
+ ## Session Startup (do this now)
30
 
31
+ 1. Read `session_state.json` from `E-Rong/til-26-ae-agent`
32
+ 2. Check `hf_jobs ps` for running jobs
33
+ 3. Check latest checkpoint on Hub (`phase*_ckpt_*.zip`)
34
+ 4. Determine: current phase, remaining steps, next action
 
 
 
35
 
36
  ---
37
 
38
+ ## How to Submit an HF Job (the only way that works)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
+ ```python
41
+ # 1. Write to sandbox
42
+ write(path="/app/train.py", content="...")
43
 
44
+ # 2. UPLOAD TO HUB (critical — job fetches from Hub URL)
45
+ hf_repo_files(
46
+ operation="upload",
47
+ repo_id="E-Rong/til-26-ae-agent",
48
+ path="train.py",
49
+ content=open("/app/train.py").read()
50
+ )
51
 
52
+ # 3. Submit job
53
+ hf_jobs(
54
+ operation="run",
55
+ script="/app/train.py", # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
56
+ dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
57
+ "numpy", "huggingface_hub", "pygame", "omegaconf",
58
+ "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
59
+ hardware_flavor="a10g-small",
60
+ timeout="6h",
61
+ namespace="E-Rong"
62
+ )
63
  ```
64
 
65
+ **Why step 2 matters**: `hf_jobs inspect` reveals the job executes:
66
+ ```bash
67
+ uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
 
 
 
 
 
68
  ```
69
+ If the file isn't on the Hub, the job 404s.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
  ---
72
 
73
+ ## How to Access the Private Env in a Job
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
  ```python
 
 
 
 
 
 
76
  from huggingface_hub import snapshot_download
77
  snapshot_download(
78
  repo_id="e-rong/til-26-ae",
79
  repo_type="space",
80
  local_dir="/app/til-26-ae-repo"
81
  )
82
+ # Then walk to find pyproject.toml and pip install -e .
83
  ```
84
 
85
+ `snapshot_download` auto-uses `HF_TOKEN`. `git clone` does not.
86
 
87
+ ---
88
 
89
+ ## Docs Update Checklist (before ANY job >30 min)
 
 
 
 
90
 
91
+ - [ ] `session_state.json` — phase, job_id, script name, hardware, timeout, expected completion
92
+ - [ ] `AGENTS.md` — any new mistakes/API gotchas learned this session
93
+ - [ ] `docs/ae.md` — research results, completed phase metrics
94
+ - [ ] Push all three to Hub BEFORE calling `hf_jobs`
95
 
96
+ ---
 
97
 
98
+ ## Technical Gotchas
 
99
 
100
+ | Gotcha | Correct | Wrong |
101
+ |---|---|---|
102
+ | **Wrapper order** | `ActionMasker(base_env)` then `Monitor(env)` | `ActionMasker(Monitor(base_env))` — masks break |
103
+ | **Env install** | `snapshot_download` + walk for `pyproject.toml` | `git clone` of private space |
104
+ | **Script delivery** | Upload to Hub, submit sandbox path | Inline 20KB string or sandbox-only file |
105
+ | **Auth** | `HF_TOKEN` env var (auto-injected in Jobs) | Passing token manually in git URLs |
 
106
 
107
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
+ ## Cost Table
 
 
 
 
 
110
 
111
+ | Hardware | $/hr | Use For |
112
+ |---|---|---|
113
+ | `cpu-basic` | ~$0.05 | Writing code, docs, planning |
114
+ | `t4-small` | ~$0.40 | Smoke tests ONLY |
115
+ | `a10g-small` | ~$1.00 | Training via HF Jobs |
116
 
117
+ **Stop GPU sandboxes immediately after smoke tests.** An idle GPU sandbox burns $1/hr for nothing.
 
 
 
 
118
 
119
  ---
120
 
121
+ ## Curriculum Summary
122
 
123
+ | Phase | Opponent | Steps | Status |
124
+ |---|---|---|---|
125
+ | 1 | Random | 500k | Complete (92% win rate) |
126
+ | 2 | Random + exploration shaping | 500k | Check `session_state.json` |
127
+ | 3 | Rule-based curriculum | 1M | Pending |
128
+
129
+ Key papers: `arxiv:2407.00662` (Pommerman curriculum + adaptive annealing), `arxiv:2006.14171` (invalid action masking).
 
 
 
 
 
130
 
131
  ---
132
 
133
+ ## File Guide
134
 
135
+ | File | Purpose |
136
+ |---|---|
137
+ | `session_state.json` | Current phase, checkpoints, mistakes, next steps |
138
+ | `docs/ae.md` | Full research, design, results |
139
+ | `phase1_final.zip` | Phase 1 complete checkpoint |
140
+ | `phase2_ckpt_*.zip` | Phase 2 intermediates |
141
+ | `phase2_resume.py` | Working HF Job script |
142
+ | `phase3_curriculum.py` | Ready-to-submit Phase 3 script |
143
+ | `smoke_test.py` | 5-min validation |
144
 
145
  ---
146
 
147
+ ## Contact
 
 
 
 
 
148
 
149
+ - **User**: E-Rong | **Org**: E-Rong
150
+ - **Billing namespace**: `E-Rong` (required on all `hf_jobs`)
151
+ - **You are**: An ephemeral agent with no memory. This file is your only brain.
152
 
153
+ *Read the files. Check the state. Test before committing compute. Update docs before every job.*