Instructions to use Lllllmmmmmm/conv-induction-babylm-strict-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Lllllmmmmmm/conv-induction-babylm-strict-small with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Lllllmmmmmm/conv-induction-babylm-strict-small", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Lllllmmmmmm/conv-induction-babylm-strict-small", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Lllllmmmmmm/conv-induction-babylm-strict-small with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Lllllmmmmmm/conv-induction-babylm-strict-small"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lllllmmmmmm/conv-induction-babylm-strict-small",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Lllllmmmmmm/conv-induction-babylm-strict-small

SGLang

How to use Lllllmmmmmm/conv-induction-babylm-strict-small with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Lllllmmmmmm/conv-induction-babylm-strict-small" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lllllmmmmmm/conv-induction-babylm-strict-small",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Lllllmmmmmm/conv-induction-babylm-strict-small" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lllllmmmmmm/conv-induction-babylm-strict-small",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Lllllmmmmmm/conv-induction-babylm-strict-small with Docker Model Runner:
```
docker model run hf.co/Lllllmmmmmm/conv-induction-babylm-strict-small
```

Conv-Routed Induction LM

A small, attention-free, sub-quadratic language model built for the BabyLM 2026 Strict-Small track (a ~10M-word training budget). It is designed to test a specific hypothesis: that a transformer's self-attention can be replaced by a division of labour between two cheaper, complementary primitives — one for local word order, one for exact long-range recall — and still match a same-scale attention baseline on grammar (BLiMP) and perplexity.

⚠️ This card describes the architecture (which is stable). Exact hyperparameters, sizes, and headline metrics are still being iterated and live in the repo's hyperparameters.json / training logs for each revision rather than here.

Architecture

Each layer is three residual sub-blocks; none is redundant:

Dynamic Conv — local, positional. A gated depthwise dilated convolution whose kernel weights are predicted per position from the token itself (content-adaptive local mixing, ~15-token reach). This is the "what just came before me" channel.
Induction Mixer — global, content-based, exact. For each token it finds the last M occurrences of the exact same token earlier in the sequence (a non-learned O(T log T) index — sort/scatter, no attention matrix), softly ranks those occurrences by how well their surrounding context matches the present with a small multi-head score, and copies the raw representation of whatever token followed each one. In short: "what came after this token last time?" A learnable sink lets it abstain. Exactness and token identity are load-bearing — fuzzy/hashed matching destroys the effect.
SwiGLU FFN — per-token computation.

The design thesis: conv handles local order, induction handles long-range exact recall, the FFN computes — splitting the work that dense attention does into two parts with sharper inductive biases and no quadratic cost.

Why it is sub-quadratic

There is no T × T attention anywhere. The induction index is built with a sort and a scatter (O(T log T)), and each token reads only a fixed number (M) of prior continuations. Memory and compute scale near-linearly in sequence length.

Intended use & scope

Research artifact for data-efficient language modelling and architecture studies. It is a small model trained on a developmentally-motivated English corpus; it is not intended for production use, factual question answering, or deployment. Generations are short-range and reflect the small training budget.

How to load

The architecture is custom, so trust_remote_code=True is required (the modeling_induction.py file ships with every revision):

from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "<your-username>/conv-induction-babylm-strict-small"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True)

The model is causal (next-token; the index only ever references earlier positions) and is padding-side agnostic — positions are derived from the attention mask and pad positions are zeroed, so both left- and right-padded batches give identical results for the real tokens. Learning-curve checkpoints are published on branches named chck_1M, chck_2M, …

Training data

BabyLM 2026 Strict-Small (~10M words of developmentally-plausible English), tokenised with a byte-level BPE vocabulary trained on the same corpus.

Limitations

Small capacity and budget: limited world knowledge and short effective context.
English, child-directed / developmental register; not representative of general web text.
A research architecture under active iteration — treat any single revision's numbers as provisional.

License

MIT. Code: https://github.com/joshua-taylor/conv-induction-babylm

Downloads last month: 2,794

Safetensors

Model size

12.3M params

Tensor type

F32

Lllllmmmmmm
/

conv-induction-babylm-strict-small