Instructions to use marin-community/delphi-2e19-210Mparams-18.2Btokens with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use marin-community/delphi-2e19-210Mparams-18.2Btokens with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="marin-community/delphi-2e19-210Mparams-18.2Btokens")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("marin-community/delphi-2e19-210Mparams-18.2Btokens")
model = AutoModelForCausalLM.from_pretrained("marin-community/delphi-2e19-210Mparams-18.2Btokens")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use marin-community/delphi-2e19-210Mparams-18.2Btokens with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "marin-community/delphi-2e19-210Mparams-18.2Btokens"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "marin-community/delphi-2e19-210Mparams-18.2Btokens",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/marin-community/delphi-2e19-210Mparams-18.2Btokens

SGLang

How to use marin-community/delphi-2e19-210Mparams-18.2Btokens with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "marin-community/delphi-2e19-210Mparams-18.2Btokens" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "marin-community/delphi-2e19-210Mparams-18.2Btokens",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "marin-community/delphi-2e19-210Mparams-18.2Btokens" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "marin-community/delphi-2e19-210Mparams-18.2Btokens",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use marin-community/delphi-2e19-210Mparams-18.2Btokens with Docker Model Runner:
```
docker model run hf.co/marin-community/delphi-2e19-210Mparams-18.2Btokens
```

delphi-2e19-210Mparams-18.2Btokens

A 210M-parameter base model from the Delphi scaling suite. Trained at 2 × 10¹⁹ FLOPs on 18.2B tokens with the Delphi recipe.

About Delphi

Delphi is the Marin team's first open scaling suite, inspired by Pythia. It has three parts:

a scaling recipe that maps compute budgets to model configurations,
a scaling suite of models trained from that recipe at IsoFLOP budgets from 3 × 10¹⁸ to 1 × 10²³ FLOPs, and
a scaling law which uses the smaller Delphi models to predict the larger ones.

A pre-registered forecast from that scaling law predicted the final loss of the largest Delphi run (1 × 10²³ FLOPs, 25 B parameters, 600 B tokens) within 0.2%, using 300× less compute than the training run itself. The same process forecasts downstream benchmarks — MMLU, HumanEval, and GSM8K — via a two-step regression combining compute and observational scaling laws.

See "Scaling Laws That Extrapolate 300× Past the Fit" for the recipe, fit, and downstream-eval projections. The full set of Delphi checkpoints — IsoFLOP grid points, held-out optima at 1e21/1e22/1e23 with multiple random seeds, and training intermediates — lives on marin-community on the Hub.

This is a research artifact, not a production model.

Model details


Architecture	Qwen 3 (pre-norm decoder, RMSNorm, RoPE, QK-norm with learned scaling, SwiGLU MLPs)
Parameters	210,054,272
Hidden size	640
Layers	7
Attention heads	5
KV heads	5 (no GQA)
Head dim	128
FFN intermediate	2560 (MLP ratio 4)
Vocab size	128,256 (Llama 3 tokenizer)
Max sequence length	4096
Position encoding	RoPE (θ = 500000, Llama 3-style scaling)
Bias terms	None
Tied embeddings	No

Training


Compute	2 × 10¹⁹ FLOPs
Tokens	18,195,939,328
Steps	34,705
Sequence length	4096
Optimizer	AdamH (Adam with Hyperball)
Recipe	Delphi (Complete(d)P-style scaling with `(T₀/T)^0.3` token-horizon LR correction)
LR schedule	WSD: 10% linear warmup, 20% linear decay, 0 floor
Precision	f32 master params, bf16 compute
Parallelism	FSDP
Data mixture	Nemotron-CC + StarCoderData + ProofPile 2
Tokenizer	Llama 3 (vocab 128,256)

AdamH, Adam with Hyperball, constrains every projection weight to stay on the Frobenius- norm sphere it was initialized on, so weight decay has nothing to regularize away and falls out of the recipe. A Complete(d)P-style transfer rule with a (T₀/T)^0.3 correction sets learning rate as token horizon grows. Reference constants: B₀ = 64, T₀ = 2.5 B tokens, η₀ = 0.00630, η₀,Adam = 0.000656, ε₀ = 1.85 × 10⁻⁸. Recipe code: experiments/scaling_law_sweeps/completed_adamh.py.

Companion releases

All Delphi model checkpoints: marin-community on the Hub.
Plot data behind every figure in the blog post: marin-community/delphi-blog-data (one config per figure, with wandb_url on every row).
Pipelines that deterministically reproduce the training mixture from public Nemotron-CC, StarCoderData, and ProofPile 2 sources: see the Marin repo.

Evaluation

This checkpoint is part of the Delphi eval suite (experiments/exp1337_eval_suite.py), which scores every Delphi run alongside reference open-weights baselines (Qwen 3, Llama 2/3, OLMo 2, Marin 8B). Following the blog's two-step forecast, soft metrics (per-choice log-prob for multiple-choice tasks, bits-per-byte for generative tasks) carry the signal the scaling law is fit on, and a sigmoid fit on an external model pool maps soft metric to hard metric (accuracy, pass@1, exact-match). Below ~1e21 FLOPs the hard metrics stay near chance even when the underlying probabilities are improving smoothly; that is expected and is exactly why the soft metrics exist.

Limitations

Trained on an English-heavy web mixture; no multilingual coverage.
Pretrained-only — no instruction tuning, RLHF, or safety alignment.
The Delphi recipe targets compute-optimal training, not inference-cost-aware overtraining; for inference-heavy deployments, an overtrained smaller model may be preferable. The blog's "off-optimal training" section quantifies the penalty.
This is one checkpoint in a much larger Delphi release; pick the one that matches your compute / parameter / token regime, or browse the full set at marin-community.

Citation

@misc{held2026delphi,
  title  = {Scaling Laws That Extrapolate 300× Past the Fit},
  author = {Held, Will and {Marin Community}},
  year   = {2026},
  url    = {https://openathena.ai/blog/delphi}
}

Downloads last month: 42

Safetensors

Model size

0.2B params

Tensor type

F32

Datasets used to train marin-community/delphi-2e19-210Mparams-18.2Btokens

Collection including marin-community/delphi-2e19-210Mparams-18.2Btokens

Delphi

Collection

Marin's first open scaling suite. 88 base models, 3e18 → 1e23 FLOPs. https://openathena.ai/blog/delphi • 89 items • Updated 7 days ago • 8