HSTU iter7 + iter8 Preservation (2026-06-07)

Full reproducibility archive for HSTU (Hierarchical Sequential Transduction Units, Meta 2024) experiments on MovieLens-20M and MovieLens-32M.

🏆 Headline SOTA result

iter8-exp4 = NEW ml-20m × BASE SOTA = NDCG@10 0.1948 (FULL canonical eval @ epoch 100)

+2.80% over HSTU-base paper baseline (0.1895)
Recipe: HSTU-base + PRISM-additive + Input Compression + Time Decay + linear_dropout=0.1 (the key discovery: low dropout matters)

📦 Contents

Each <name>.tar.gz is a self-contained reproducible bundle with:

config/*.gin — full gin config + include chain
ckpts/HSTU-..._ep100 — resumable PyTorch checkpoint (model + optimizer + RNG + epoch + batch_id state)
tb/.../events.out.tfevents.* — full TensorBoard event files with per-epoch trajectory
MANIFEST.json — machine-readable metadata

Plus code-archive.tar.gz containing the full forked buck-iter2 codebase with iter2 patches (resumable ckpts, EVAL2, in-memory dataset preloader, fbgemm compat fix) + iter7/iter8 patches (PRISM-additive, low-dropout, time-decay).

🚀 Quick reproduction

# Download the SOTA bundle
hf download tzchen07/hstu-iter78-preservation iter8-exp4-SOTA.tar.gz --local-dir .

# Download the code
hf download tzchen07/hstu-iter78-preservation code-archive.tar.gz --local-dir .
tar xzf code-archive.tar.gz
cd buck-iter2

# Set up env (Python 3.11 + torch 2.7.1+cu128 + fbgemm-gpu 1.2.0)
python3 -m venv .venv && source .venv/bin/activate
pip install torch==2.7.1 --index-url https://download.pytorch.org/whl/cu128
pip install fbgemm-gpu==1.2.0 torchrec gin-config tensorboard absl-py pandas numpy

# Stage data (preprocessed ml-20m npz files - see separate data archive)

# Run training (the SOTA recipe)
NCCL_TUNER_CONFIG_PATH=/shared/nccl_tuner.textproto \
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=$(pwd) \
python3 generative_recommenders/github/main.py \
    --gin_config_file=generative_recommenders/github/configs/ml-20m/iter8-exp4-additive-lowdrop.gin \
    --master_port=12300

See PRESERVATION_REPORT.md for the full reproducibility audit (including data preprocessing notes, env spec, and per-experiment metadata).

📊 All 8 experiments

Bundle	NDCG@10 (FULL canonical, ep100)	Dataset	Model	Recipe
`iter8-exp4-SOTA.tar.gz` ⭐	0.1948 (+2.80% paper)	ml-20m	BASE	PRISM-additive + IC + TD + dropout=0.1
`iter8-exp1-HardNeg.tar.gz`	0.1920 (+1.32% paper)	ml-20m	BASE	PRISM-additive + IC + TD + HardNeg
`iter8-exp0-PopDebias.tar.gz`	0.1900 (+0.26% paper)	ml-20m	BASE	PRISM-additive + IC + TD + PopDebias
`iter7-exp1-PRISM-FILM-PopDebias-SOTA.tar.gz`	0.1912 (+0.90% paper)	ml-20m	BASE	PRISM-FILM + IC + TD + PopDebias
`iter7-exp2-PRISM-MoE.tar.gz`	0.1895 (paper match)	ml-20m	BASE	PRISM-MoE + IC + TD
`iter7-exp0-PRISM-FILM-HardNeg.tar.gz`	0.1880 (−0.79% paper)	ml-20m	BASE	PRISM-FILM + IC + TD + HardNeg
`iter7-exp5-l700.tar.gz`	0.1901	ml-20m	BASE	PRISM-FILM + IC + TD, l=700
`iter7-exp6-ml32m-base.tar.gz`	0.1508	ml-32m	BASE	PRISM-additive + IC + TD (first ml-32m × BASE canonical result)

⚖️ License & attribution

Code: MIT (matches Meta's HSTU OSS code)
Data: MovieLens (research use, see GroupLens license)
Trained weights: research artifacts; for reproducibility & verification

🙏 Citation

If you use this preservation in your work:

@misc{chen2026hstu_iter78,
  title  = {HSTU iter7+iter8 Preservation: PRISM-additive + low-dropout SOTA on MovieLens-20M},
  author = {Chen, Tony},
  year   = {2026},
  url    = {https://huggingface.co/tzchen07/hstu-iter78-preservation},
  note   = {NDCG@10 0.1948 (+2.80\% over HSTU-base paper baseline)}
}

Built on top of:

Zhai et al., 2024. "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (Meta HSTU)
Kang et al. 2018. "Self-Attentive Sequential Recommendation" (SASRec)

Preserved 2026-06-07 by Rovo Dev. Full bundle md5 checksums in MANIFEST.md.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support