Instructions to use harims95/LoopLM-135M-naive with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use harims95/LoopLM-135M-naive with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="harims95/LoopLM-135M-naive", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("harims95/LoopLM-135M-naive", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use harims95/LoopLM-135M-naive with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "harims95/LoopLM-135M-naive"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "harims95/LoopLM-135M-naive",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/harims95/LoopLM-135M-naive

SGLang

How to use harims95/LoopLM-135M-naive with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "harims95/LoopLM-135M-naive" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "harims95/LoopLM-135M-naive",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "harims95/LoopLM-135M-naive" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "harims95/LoopLM-135M-naive",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use harims95/LoopLM-135M-naive with Docker Model Runner:
```
docker model run hf.co/harims95/LoopLM-135M-naive
```

LoopLM-135M-naive

A 135M parameter dense looped transformer trained from scratch on FineWeb. Built as part of an exploration of looped LLM architectures inspired by Parcae.

This is the naive looped variant — a clean baseline without Parcae's LTI stability mechanisms, which were found to underperform at this scale across 5 ablations.

📂 Code: github.com/harims95/LoopLM 📄 Parcae paper: arXiv:2604.12946

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("harims95/LoopLM-135M-naive")
model = AutoModelForCausalLM.from_pretrained(
    "harims95/LoopLM-135M-naive",
    trust_remote_code=True,
)
model.eval()

inputs = tokenizer("The capital of France is", return_tensors="pt")
with torch.no_grad():
    out = model(**inputs)

next_token_id = out.logits[0, -1].argmax().item()
print(tokenizer.decode([next_token_id]))

For generation with sampling (top-k + temperature), use scripts/generate.py.

Architecture

Input tokens
    ↓
[Embedding]
    ↓
[Prelude: 4 transformer blocks]
    ↓
e (input injection)
    ↓
[Loop block × T loops]  ← T ~ Poisson(μ=6) per-sequence
    ↓                      Update: h_{t+1} = block(h + e)
h_final
    ↓
[Coda: 2 transformer blocks]
    ↓
[Tied lm_head] → logits

Specs:

Type: Dense looped transformer (recurrent reuse of one transformer block)
Total params: 135M (134.1M unique trainable, tied input/output embeddings)
d_model: 1024
Attention: GQA with 16 query heads / 8 KV heads, head_dim=64
Position encoding: RoPE (θ=10000)
Normalization: RMSNorm pre-norm, QK-norm
FFN: SwiGLU, dense_ffn=2816
Vocab: 50304 (GPT-2 BPE + padding), tied embeddings

Training


Dataset	FineWeb (raw, `kjj0/fineweb10B-gpt2`)
Tokens consumed	~4.6B
Steps	17,500
Hardware	2× H100 on Modal
Wall clock	~3 hours
Total cost	~$22

Hyperparameters:

Batch: 262,144 tokens/step (micro=32 × seq=1024 × 2 GPUs × accum=4)
Optimizer: Muon (matrices) + AdamW (norms, biases, embeddings)
LR: Muon 0.02, AdamW 3e-4
Schedule: 100-step warmup, 60% constant LR, 40% cosine decay to 0.1× peak
Precision: bf16 with fp32 logits
μ_rec=6 Poisson per-sequence loop depth
μ_bwd=3 truncated BPTT (gradients only through last 3 loops)

Results

Model	Architecture	Tokens	Val Loss (FineWeb)
HobbyLM-30M (prior)	Dense (8 layers)	1B	3.91
LoopLM-135M-naive (this)	Dense looped	4.6B	3.95
HobbyLM-130M MoE (sibling)	MoE (140M total / 62M active)	10B	3.30

At this scale, sparse MoE remains more sample-efficient than dense looped. Looping clearly helped vs the 30M dense baseline but didn't surpass MoE at matched parameters.

The Parcae Investigation (Honest Findings)

This project began as an attempt to reproduce Parcae's LTI stability mechanisms for looped LMs. Across 5 ablations, none of the Parcae variants beat the naive baseline:

#	Variant	Description	Final Val
1	Naive	`h_{t+1} = block(h + e)`	3.84 (FineWeb-Edu)
2	A matrix	+ LTI step in parallel	3.84 (tied)
3	+ input norm v1	Wrong arch flow	diverged
4	+ LTI before block	Fixed arch + B identity init	worse
5	+ B → AdamW (wd=0)	Match official optimizer routing	dramatically worse

Each "fix" — bringing the implementation closer to the official Parcae code — made performance worse. After consulting the paper's Appendix Q, the official repo, and multiple debugging passes:

Parcae's stability mechanisms appear to require larger scale (1B+ params, 100B+ tokens) to demonstrate benefit. At 135M params / 4.6B tokens, naive looped reuse is competitive enough.

The Parcae paper itself reports its stability tricks help most when training runs into "late-stage loss spikes after 170k steps." Our runs were at 17.5k steps. We never reached the regime where these mechanisms pay off.

Example Outputs (Cherrypicked)

Generated at temperature=0.8, top_k=50:

Prompt: "The advantages of solar energy include"

The advantages of solar energy include the advantages of solar energy. At the same time, solar energy is used for generating electricity, and solar energy is the first choice for solar power generation. Solar energy is generally renewable, and is considered a renewable energy.

Prompt: "Once upon a time, in a small village,"

Once upon a time, in a small village, where you could be greeted with a gentle, friendly face. This beautiful, charming village is situated on a calm, peaceful setting; with its peaceful nature and calm nature, this charming village does not exist.

Honest assessment: Locally fluent, syntactically valid English. Prone to repetition and invented facts. Expected behavior for a 135M model trained on ~4.6B tokens — not competitive with modern instruction-tuned models, but a clean from-scratch baseline.

Reproducibility

Full training code: github.com/harims95/LoopLM

spec.json in this repo contains the exact training configuration (CLI args, model config, train config, git commit hash, GPU type, PyTorch version).

To reproduce the training run:

git clone https://github.com/harims95/LoopLM
cd LoopLM
pip install -r requirements.txt
pip install modal && modal token new

# Download FineWeb shards to Modal volume
python -m modal run --detach training/modal_train.py \
    --action download --dataset fineweb --shards 50

# Train
python -m modal run --detach training/modal_train.py \
    --action train --preset 135M --steps 20000 \
    --run-name looplm_naive --gpus 2 --micro 32 \
    --seq-len 1024 --batch-tokens 262144 \
    --dataset fineweb \
    --overrides "use_a_matrix=false,use_input_norm=false"

Limitations

Not instruction-tuned. This is a base model only.
Small. 135M parameters; expect hallucination and limited factual recall.
Repetition. No repetition penalty applied at training; generation benefits from top_k sampling.
No .generate() polish. The HF wrapper returns logits; a vanilla sampling loop is in scripts/generate.py.
English only. Tokenizer is GPT-2 BPE; training data is FineWeb English.

Acknowledgments

@harishsg993010 — training infrastructure (Muon, data loader, Modal harness, optimizer setup)
Sandy Research — official Parcae implementation that helped me debug
The Parcae authors — paper and honest scaling analysis
kjj0 — FineWeb GPT-2 tokenized shards
Modal Labs — accessible H100 training

License

Apache 2.0

Citation

If you use this model or the Parcae findings, please cite:

@misc{looplm-135m-naive,
  author = {Hari},
  title = {LoopLM-135M-naive: A dense looped transformer with honest Parcae ablations},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/harims95/LoopLM-135M-naive}
}

Downloads last month: -

Safetensors

Model size

0.2B params

Tensor type

F32

Dataset used to train harims95/LoopLM-135M-naive

Paper for harims95/LoopLM-135M-naive

Parcae: Scaling Laws For Stable Looped Language Models

Paper • 2604.12946 • Published Apr 14 • 8