Instructions to use marin-community/delphi-9e18-669Mparams-2.3Btokens with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use marin-community/delphi-9e18-669Mparams-2.3Btokens with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="marin-community/delphi-9e18-669Mparams-2.3Btokens")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("marin-community/delphi-9e18-669Mparams-2.3Btokens")
model = AutoModelForCausalLM.from_pretrained("marin-community/delphi-9e18-669Mparams-2.3Btokens")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use marin-community/delphi-9e18-669Mparams-2.3Btokens with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "marin-community/delphi-9e18-669Mparams-2.3Btokens"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "marin-community/delphi-9e18-669Mparams-2.3Btokens",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/marin-community/delphi-9e18-669Mparams-2.3Btokens

SGLang

How to use marin-community/delphi-9e18-669Mparams-2.3Btokens with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "marin-community/delphi-9e18-669Mparams-2.3Btokens" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "marin-community/delphi-9e18-669Mparams-2.3Btokens",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "marin-community/delphi-9e18-669Mparams-2.3Btokens" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "marin-community/delphi-9e18-669Mparams-2.3Btokens",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use marin-community/delphi-9e18-669Mparams-2.3Btokens with Docker Model Runner:
```
docker model run hf.co/marin-community/delphi-9e18-669Mparams-2.3Btokens
```

delphi-9e18-669Mparams-2.3Btokens

A 669M-parameter base model from the Delphi scaling suite. Trained at 9 × 10¹⁸ FLOPs on 2.3B tokens with the Delphi recipe.

About Delphi

Delphi is the Marin team's first open scaling suite, inspired by Pythia. It has three parts:

a scaling recipe that maps compute budgets to model configurations,
a scaling suite of models trained from that recipe at IsoFLOP budgets from 3 × 10¹⁸ to 1 × 10²³ FLOPs, and
a scaling law which uses the smaller Delphi models to predict the larger ones.

A pre-registered forecast from that scaling law predicted the final loss of the largest Delphi run (1 × 10²³ FLOPs, 25 B parameters, 600 B tokens) within 0.2%, using 300× less compute than the training run itself. The same process forecasts downstream benchmarks — MMLU, HumanEval, and GSM8K — via a two-step regression combining compute and observational scaling laws.

See "Scaling Laws That Extrapolate 300× Past the Fit" for the recipe, fit, and downstream-eval projections. The full set of Delphi checkpoints — IsoFLOP grid points, held-out optima at 1e21/1e22/1e23 with multiple random seeds, and training intermediates — lives on marin-community on the Hub.

This is a research artifact, not a production model.

Model details


Architecture	Qwen 3 (pre-norm decoder, RMSNorm, RoPE, QK-norm with learned scaling, SwiGLU MLPs)
Parameters	669,160,448
Hidden size	1280
Layers	13
Attention heads	10
KV heads	10 (no GQA)
Head dim	128
FFN intermediate	5120 (MLP ratio 4)
Vocab size	128,256 (Llama 3 tokenizer)
Max sequence length	4096
Position encoding	RoPE (θ = 500000, Llama 3-style scaling)
Bias terms	None
Tied embeddings	No

Training


Compute	9 × 10¹⁸ FLOPs
Tokens	2,336,227,328
Steps	35,647
Sequence length	4096
Optimizer	AdamH (Adam with Hyperball)
Recipe	Delphi (Complete(d)P-style scaling with `(T₀/T)^0.3` token-horizon LR correction)
LR schedule	WSD: 10% linear warmup, 20% linear decay, 0 floor
Precision	f32 master params, bf16 compute
Parallelism	FSDP
Data mixture	Nemotron-CC + StarCoderData + ProofPile 2
Tokenizer	Llama 3 (vocab 128,256)

AdamH, Adam with Hyperball, constrains every projection weight to stay on the Frobenius- norm sphere it was initialized on, so weight decay has nothing to regularize away and falls out of the recipe. A Complete(d)P-style transfer rule with a (T₀/T)^0.3 correction sets learning rate as token horizon grows. Reference constants: B₀ = 64, T₀ = 2.5 B tokens, η₀ = 0.00630, η₀,Adam = 0.000656, ε₀ = 1.85 × 10⁻⁸. Recipe code: experiments/scaling_law_sweeps/completed_adamh.py.

Companion releases

All Delphi model checkpoints: marin-community on the Hub.
Plot data behind every figure in the blog post: marin-community/delphi-blog-data (one config per figure, with wandb_url on every row).
Pipelines that deterministically reproduce the training mixture from public Nemotron-CC, StarCoderData, and ProofPile 2 sources: see the Marin repo.

Evaluation

This checkpoint is part of the Delphi eval suite (experiments/exp1337_eval_suite.py), which scores every Delphi run alongside reference open-weights baselines (Qwen 3, Llama 2/3, OLMo 2, Marin 8B). Following the blog's two-step forecast, soft metrics (per-choice log-prob for multiple-choice tasks, bits-per-byte for generative tasks) carry the signal the scaling law is fit on, and a sigmoid fit on an external model pool maps soft metric to hard metric (accuracy, pass@1, exact-match). Below ~1e21 FLOPs the hard metrics stay near chance even when the underlying probabilities are improving smoothly; that is expected and is exactly why the soft metrics exist.

Limitations

Trained on an English-heavy web mixture; no multilingual coverage.
Pretrained-only — no instruction tuning, RLHF, or safety alignment.
The Delphi recipe targets compute-optimal training, not inference-cost-aware overtraining; for inference-heavy deployments, an overtrained smaller model may be preferable. The blog's "off-optimal training" section quantifies the penalty.
This is one checkpoint in a much larger Delphi release; pick the one that matches your compute / parameter / token regime, or browse the full set at marin-community.

Citation

@misc{held2026delphi,
  title  = {Scaling Laws That Extrapolate 300× Past the Fit},
  author = {Held, Will and {Marin Community}},
  year   = {2026},
  url    = {https://openathena.ai/blog/delphi}
}

Downloads last month: 305

Safetensors

Model size

0.7B params

Tensor type

F32

Datasets used to train marin-community/delphi-9e18-669Mparams-2.3Btokens

Collection including marin-community/delphi-9e18-669Mparams-2.3Btokens

Delphi

Collection

Marin's first open scaling suite. 88 base models, 3e18 → 1e23 FLOPs. https://openathena.ai/blog/delphi • 89 items • Updated 4 days ago • 8