Recurrent-Llama-3.2-1B

A depth-recurrent (Huginn / Raven) language model retrofitted from meta-llama/Llama-3.2-1B by model surgery followed by a recurrence-curriculum healing phase.

Instead of a fixed stack of layers, the model has a small recurrent core block that is looped a controllable number of times at inference. This lets you spend more compute on harder inputs without adding parameters β€” the "think deeper" knob is num_steps (a.k.a. recurrence depth).

Method: Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence (McLeish et al., 2025), built on the Huginn/Raven architecture of Scaling up Test-Time Compute with Latent Reasoning.

Architecture

input β†’ embed β†’ prelude (4 layers) β†’ [ adapter + recurrent core (6 layers) ] Γ— R β†’ coda (4 layers) β†’ norm β†’ lm_head
                                       └──────────── looped R times β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Base model Llama-3.2-1B (16 layers)
Split (prelude / recurrent core / coda) 4 / 6 / 4 (source layers 4–5 dropped)
Recurrence at inference any num_steps; trained up to 16
Block / norm / RoPE Llama pre-norm, RMSNorm, native Llama-3 RoPE (ΞΈ=500000)
Params ~1.39B
model_type huginn_raven

The adapter re-injects the prelude output at every recurrent step, so the latent state cannot drift away from the input regardless of depth.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "irafm-llm/Recurrent-Llama-3.2-1B"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda().eval()

ids = tok("The history of mathematics is", return_tensors="pt").input_ids.cuda()
out = model.generate(ids, max_new_tokens=40, do_sample=False,
                     num_steps=32,          # <-- recurrence depth: raise for more test-time compute
                     tokenizer=tok, pad_token_id=tok.eos_token_id)
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))

trust_remote_code=True is required β€” the repo bundles its own raven_modeling_minimal.py. The model exposes the full Huginn-0125 step API (embed_inputs, initialize_state, iterate_one_step, predict_from_latents, forward_with_adaptive_compute) and is a drop-in for code written against Huginn-0125, including per-sentence selective-recurrence control.

How it was made

  1. Surgery. Llama-3.2-1B's layers are split into prelude (0–3), recurrent core (6–11) and coda (12–15); layers 4–5 are dropped. Attention/MLP/norm weights are copied verbatim (fused QKV and gate-up), embeddings untied. The conversion reproduces the source model's logits exactly (full-cover check: logits MSE ~1e-11) and is bit-identical to the official smcleish/Recurrent-Llama-3.2-untrained on all non-adapter tensors.
  2. Healing. ~98M tokens of FineWeb-Edu, sequence length 1024, AdamW lr 5e-5, grad-clip 1.0, bf16. The mean recurrence is ramped with a 1-sqrt curriculum up to 16; depth is sampled per-step (log-normal-Poisson) and gradients are truncated to the last 8 recurrent passes (truncated BPTT). Eval loss (@rec 16): 14.2 β†’ 2.8.

Limitations

  • Demonstration-scale healing. ~98M tokens vs the paper's ~50B; output is fluent but can be repetitive under greedy decoding. Not instruction-tuned.
  • Inherits Llama-3.2's knowledge cutoff, biases and the Llama 3.2 Community License.
  • Recurrence was trained up to depth 16; higher num_steps works but is extrapolation.

Citation

@article{mcleish2025teaching,
  title={Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence},
  author={McLeish, Sean and Li, Ang and Kirchenbauer, John and Kalra, Dayal Singh and Bartoldson, Brian R. and Kailkhura, Bhavya and Schwarzschild, Avi and Geiping, Jonas and Goldstein, Tom and Goldblum, Micah},
  journal={arXiv preprint arXiv:2511.07384}, year={2025}
}

Built with Llama. Converted with huginn_surgery.

Downloads last month
2
Safetensors
Model size
1B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for irafm-llm/Recurrent-Llama-3.2-1B

Finetuned
(936)
this model

Papers for irafm-llm/Recurrent-Llama-3.2-1B