silent — JEPA world model that plays predator by listening

A 13M-parameter Joint Embedding Predictive Architecture (JEPA) trained to predict next-step audio embeddings on a custom predator-prey environment. The predator senses the world through four cardioid microphones (N/E/S/W) on its body and chooses thrust + sonar ping actions to hunt the player.

Live demo: https://sotoalt.dev/experiments/silent.html
Code: https://github.com/SotoAlt/silent
Research journal: https://github.com/SotoAlt/silent/blob/main/docs/JOURNAL.md

Architecture

ViT-Tiny encoder (4-channel input, trained from scratch, ~6M params)
Linear action encoder (frameskip x 3 -> 192)
6-layer AR causal transformer predictor with AdaLN-zero conditioning
192 -> 2048 -> 192 projector MLP with BatchNorm
SIGReg regularizer on projected embeddings
Jointly-trained state head MLP (192 -> 256 -> 256 -> 8) at lambda=10

Total: ~13M params. Runs at ~10 Hz on a single shared CPU vCPU.

Files

File	Purpose
`silent_v1_3e_ep030.pt`	Shipping checkpoint -- joint DexWM, lambda=10
`3e_ep030_head_uniform.pt`	Post-hoc state head for planner CEM cost

Quick start

pip install torch torchvision timm einops fastapi uvicorn websockets \
    librosa pymunk h5py pygame scipy

# Download checkpoints
huggingface-cli download sotoalt/silent --local-dir checkpoints/

# Clone the code
git clone https://github.com/SotoAlt/silent.git
cd silent

# Run the inference server
python -m world_model.infer_silent_env \
    --jepa-ckpt checkpoints/silent_v1_3e_ep030.pt \
    --jepa-head checkpoints/3e_ep030_head_uniform.pt \
    --host 0.0.0.0 --port 8801

# Open http://localhost:8801/ in a browser. WASD to move, space to voice.

Training

The full pipeline (data generation, pure-LeWM smoke test, preflight v2 probe, joint DexWM validation gate, full 100-epoch run, post-hoc head, ship audit) is documented in the research journal and the README.

Related work

LeWM (Maes, Le Lidec, Scieur, LeCun, Balestriero, 2026) - arxiv 2603.19312
DexWM - arxiv 2512.13644 (the joint state-head technique)
V-JEPA 2-AC (FAIR, 2026)

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning