Instructions to use harims95/LoopLM-135M-naive with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use harims95/LoopLM-135M-naive with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="harims95/LoopLM-135M-naive", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("harims95/LoopLM-135M-naive", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use harims95/LoopLM-135M-naive with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "harims95/LoopLM-135M-naive" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "harims95/LoopLM-135M-naive", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/harims95/LoopLM-135M-naive
- SGLang
How to use harims95/LoopLM-135M-naive with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "harims95/LoopLM-135M-naive" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "harims95/LoopLM-135M-naive", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "harims95/LoopLM-135M-naive" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "harims95/LoopLM-135M-naive", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use harims95/LoopLM-135M-naive with Docker Model Runner:
docker model run hf.co/harims95/LoopLM-135M-naive
LoopLM-135M-naive
A 135M parameter dense looped transformer trained from scratch on FineWeb. Built as part of an exploration of looped LLM architectures inspired by Parcae.
This is the naive looped variant — a clean baseline without Parcae's LTI stability mechanisms, which were found to underperform at this scale across 5 ablations.
📂 Code: github.com/harims95/LoopLM 📄 Parcae paper: arXiv:2604.12946
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("harims95/LoopLM-135M-naive")
model = AutoModelForCausalLM.from_pretrained(
"harims95/LoopLM-135M-naive",
trust_remote_code=True,
)
model.eval()
inputs = tokenizer("The capital of France is", return_tensors="pt")
with torch.no_grad():
out = model(**inputs)
next_token_id = out.logits[0, -1].argmax().item()
print(tokenizer.decode([next_token_id]))
For generation with sampling (top-k + temperature), use scripts/generate.py.
Architecture
Input tokens
↓
[Embedding]
↓
[Prelude: 4 transformer blocks]
↓
e (input injection)
↓
[Loop block × T loops] ← T ~ Poisson(μ=6) per-sequence
↓ Update: h_{t+1} = block(h + e)
h_final
↓
[Coda: 2 transformer blocks]
↓
[Tied lm_head] → logits
Specs:
- Type: Dense looped transformer (recurrent reuse of one transformer block)
- Total params: 135M (134.1M unique trainable, tied input/output embeddings)
d_model: 1024- Attention: GQA with 16 query heads / 8 KV heads,
head_dim=64 - Position encoding: RoPE (θ=10000)
- Normalization: RMSNorm pre-norm, QK-norm
- FFN: SwiGLU,
dense_ffn=2816 - Vocab: 50304 (GPT-2 BPE + padding), tied embeddings
Training
| Dataset | FineWeb (raw, kjj0/fineweb10B-gpt2) |
| Tokens consumed | ~4.6B |
| Steps | 17,500 |
| Hardware | 2× H100 on Modal |
| Wall clock | ~3 hours |
| Total cost | ~$22 |
Hyperparameters:
- Batch: 262,144 tokens/step (micro=32 × seq=1024 × 2 GPUs × accum=4)
- Optimizer: Muon (matrices) + AdamW (norms, biases, embeddings)
- LR: Muon 0.02, AdamW 3e-4
- Schedule: 100-step warmup, 60% constant LR, 40% cosine decay to 0.1× peak
- Precision: bf16 with fp32 logits
- μ_rec=6 Poisson per-sequence loop depth
- μ_bwd=3 truncated BPTT (gradients only through last 3 loops)
Results
| Model | Architecture | Tokens | Val Loss (FineWeb) |
|---|---|---|---|
| HobbyLM-30M (prior) | Dense (8 layers) | 1B | 3.91 |
| LoopLM-135M-naive (this) | Dense looped | 4.6B | 3.95 |
| HobbyLM-130M MoE (sibling) | MoE (140M total / 62M active) | 10B | 3.30 |
At this scale, sparse MoE remains more sample-efficient than dense looped. Looping clearly helped vs the 30M dense baseline but didn't surpass MoE at matched parameters.
The Parcae Investigation (Honest Findings)
This project began as an attempt to reproduce Parcae's LTI stability mechanisms for looped LMs. Across 5 ablations, none of the Parcae variants beat the naive baseline:
| # | Variant | Description | Final Val |
|---|---|---|---|
| 1 | Naive | h_{t+1} = block(h + e) |
3.84 (FineWeb-Edu) |
| 2 | A matrix | + LTI step in parallel | 3.84 (tied) |
| 3 | + input norm v1 | Wrong arch flow | diverged |
| 4 | + LTI before block | Fixed arch + B identity init | worse |
| 5 | + B → AdamW (wd=0) | Match official optimizer routing | dramatically worse |
Each "fix" — bringing the implementation closer to the official Parcae code — made performance worse. After consulting the paper's Appendix Q, the official repo, and multiple debugging passes:
Parcae's stability mechanisms appear to require larger scale (1B+ params, 100B+ tokens) to demonstrate benefit. At 135M params / 4.6B tokens, naive looped reuse is competitive enough.
The Parcae paper itself reports its stability tricks help most when training runs into "late-stage loss spikes after 170k steps." Our runs were at 17.5k steps. We never reached the regime where these mechanisms pay off.
Example Outputs (Cherrypicked)
Generated at temperature=0.8, top_k=50:
Prompt: "The advantages of solar energy include"
The advantages of solar energy include the advantages of solar energy. At the same time, solar energy is used for generating electricity, and solar energy is the first choice for solar power generation. Solar energy is generally renewable, and is considered a renewable energy.
Prompt: "Once upon a time, in a small village,"
Once upon a time, in a small village, where you could be greeted with a gentle, friendly face. This beautiful, charming village is situated on a calm, peaceful setting; with its peaceful nature and calm nature, this charming village does not exist.
Honest assessment: Locally fluent, syntactically valid English. Prone to repetition and invented facts. Expected behavior for a 135M model trained on ~4.6B tokens — not competitive with modern instruction-tuned models, but a clean from-scratch baseline.
Reproducibility
Full training code: github.com/harims95/LoopLM
spec.json in this repo contains the exact training configuration (CLI args, model config, train config, git commit hash, GPU type, PyTorch version).
To reproduce the training run:
git clone https://github.com/harims95/LoopLM
cd LoopLM
pip install -r requirements.txt
pip install modal && modal token new
# Download FineWeb shards to Modal volume
python -m modal run --detach training/modal_train.py \
--action download --dataset fineweb --shards 50
# Train
python -m modal run --detach training/modal_train.py \
--action train --preset 135M --steps 20000 \
--run-name looplm_naive --gpus 2 --micro 32 \
--seq-len 1024 --batch-tokens 262144 \
--dataset fineweb \
--overrides "use_a_matrix=false,use_input_norm=false"
Limitations
- Not instruction-tuned. This is a base model only.
- Small. 135M parameters; expect hallucination and limited factual recall.
- Repetition. No repetition penalty applied at training; generation benefits from
top_ksampling. - No
.generate()polish. The HF wrapper returns logits; a vanilla sampling loop is inscripts/generate.py. - English only. Tokenizer is GPT-2 BPE; training data is FineWeb English.
Acknowledgments
- @harishsg993010 — training infrastructure (Muon, data loader, Modal harness, optimizer setup)
- Sandy Research — official Parcae implementation that helped me debug
- The Parcae authors — paper and honest scaling analysis
- kjj0 — FineWeb GPT-2 tokenized shards
- Modal Labs — accessible H100 training
License
Apache 2.0
Citation
If you use this model or the Parcae findings, please cite:
@misc{looplm-135m-naive,
author = {Hari},
title = {LoopLM-135M-naive: A dense looped transformer with honest Parcae ablations},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/harims95/LoopLM-135M-naive}
}
- Downloads last month
- -