HSTU iter7 + iter8 Preservation (2026-06-07)
Full reproducibility archive for HSTU (Hierarchical Sequential Transduction Units, Meta 2024) experiments on MovieLens-20M and MovieLens-32M.
π Headline SOTA result
iter8-exp4 = NEW ml-20m Γ BASE SOTA = NDCG@10 0.1948 (FULL canonical eval @ epoch 100)
- +2.80% over HSTU-base paper baseline (0.1895)
- Recipe: HSTU-base + PRISM-additive + Input Compression + Time Decay + linear_dropout=0.1 (the key discovery: low dropout matters)
π¦ Contents
Each <name>.tar.gz is a self-contained reproducible bundle with:
config/*.ginβ full gin config + include chainckpts/HSTU-..._ep100β resumable PyTorch checkpoint (model + optimizer + RNG + epoch + batch_id state)tb/.../events.out.tfevents.*β full TensorBoard event files with per-epoch trajectoryMANIFEST.jsonβ machine-readable metadata
Plus code-archive.tar.gz containing the full forked buck-iter2 codebase with iter2 patches (resumable ckpts, EVAL2, in-memory dataset preloader, fbgemm compat fix) + iter7/iter8 patches (PRISM-additive, low-dropout, time-decay).
π Quick reproduction
# Download the SOTA bundle
hf download tzchen07/hstu-iter78-preservation iter8-exp4-SOTA.tar.gz --local-dir .
# Download the code
hf download tzchen07/hstu-iter78-preservation code-archive.tar.gz --local-dir .
tar xzf code-archive.tar.gz
cd buck-iter2
# Set up env (Python 3.11 + torch 2.7.1+cu128 + fbgemm-gpu 1.2.0)
python3 -m venv .venv && source .venv/bin/activate
pip install torch==2.7.1 --index-url https://download.pytorch.org/whl/cu128
pip install fbgemm-gpu==1.2.0 torchrec gin-config tensorboard absl-py pandas numpy
# Stage data (preprocessed ml-20m npz files - see separate data archive)
# Run training (the SOTA recipe)
NCCL_TUNER_CONFIG_PATH=/shared/nccl_tuner.textproto \
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=$(pwd) \
python3 generative_recommenders/github/main.py \
--gin_config_file=generative_recommenders/github/configs/ml-20m/iter8-exp4-additive-lowdrop.gin \
--master_port=12300
See PRESERVATION_REPORT.md for the full reproducibility audit (including data preprocessing notes, env spec, and per-experiment metadata).
π All 8 experiments
| Bundle | NDCG@10 (FULL canonical, ep100) | Dataset | Model | Recipe |
|---|---|---|---|---|
iter8-exp4-SOTA.tar.gz β |
0.1948 (+2.80% paper) | ml-20m | BASE | PRISM-additive + IC + TD + dropout=0.1 |
iter8-exp1-HardNeg.tar.gz |
0.1920 (+1.32% paper) | ml-20m | BASE | PRISM-additive + IC + TD + HardNeg |
iter8-exp0-PopDebias.tar.gz |
0.1900 (+0.26% paper) | ml-20m | BASE | PRISM-additive + IC + TD + PopDebias |
iter7-exp1-PRISM-FILM-PopDebias-SOTA.tar.gz |
0.1912 (+0.90% paper) | ml-20m | BASE | PRISM-FILM + IC + TD + PopDebias |
iter7-exp2-PRISM-MoE.tar.gz |
0.1895 (paper match) | ml-20m | BASE | PRISM-MoE + IC + TD |
iter7-exp0-PRISM-FILM-HardNeg.tar.gz |
0.1880 (β0.79% paper) | ml-20m | BASE | PRISM-FILM + IC + TD + HardNeg |
iter7-exp5-l700.tar.gz |
0.1901 | ml-20m | BASE | PRISM-FILM + IC + TD, l=700 |
iter7-exp6-ml32m-base.tar.gz |
0.1508 | ml-32m | BASE | PRISM-additive + IC + TD (first ml-32m Γ BASE canonical result) |
βοΈ License & attribution
- Code: MIT (matches Meta's HSTU OSS code)
- Data: MovieLens (research use, see GroupLens license)
- Trained weights: research artifacts; for reproducibility & verification
π Citation
If you use this preservation in your work:
@misc{chen2026hstu_iter78,
title = {HSTU iter7+iter8 Preservation: PRISM-additive + low-dropout SOTA on MovieLens-20M},
author = {Chen, Tony},
year = {2026},
url = {https://huggingface.co/tzchen07/hstu-iter78-preservation},
note = {NDCG@10 0.1948 (+2.80\% over HSTU-base paper baseline)}
}
Built on top of:
- Zhai et al., 2024. "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (Meta HSTU)
- Kang et al. 2018. "Self-Attentive Sequential Recommendation" (SASRec)
Preserved 2026-06-07 by Rovo Dev. Full bundle md5 checksums in MANIFEST.md.