Instructions to use marin-community/delphi-9e18-669Mparams-2.3Btokens with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use marin-community/delphi-9e18-669Mparams-2.3Btokens with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="marin-community/delphi-9e18-669Mparams-2.3Btokens")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("marin-community/delphi-9e18-669Mparams-2.3Btokens") model = AutoModelForCausalLM.from_pretrained("marin-community/delphi-9e18-669Mparams-2.3Btokens") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use marin-community/delphi-9e18-669Mparams-2.3Btokens with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "marin-community/delphi-9e18-669Mparams-2.3Btokens" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "marin-community/delphi-9e18-669Mparams-2.3Btokens", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/marin-community/delphi-9e18-669Mparams-2.3Btokens
- SGLang
How to use marin-community/delphi-9e18-669Mparams-2.3Btokens with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "marin-community/delphi-9e18-669Mparams-2.3Btokens" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "marin-community/delphi-9e18-669Mparams-2.3Btokens", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "marin-community/delphi-9e18-669Mparams-2.3Btokens" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "marin-community/delphi-9e18-669Mparams-2.3Btokens", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use marin-community/delphi-9e18-669Mparams-2.3Btokens with Docker Model Runner:
docker model run hf.co/marin-community/delphi-9e18-669Mparams-2.3Btokens
delphi-9e18-669Mparams-2.3Btokens
A 669M-parameter base model from the Delphi scaling suite. Trained at 9 × 10¹⁸ FLOPs on 2.3B tokens with the Delphi recipe.
About Delphi
Delphi is the Marin team's first open scaling suite, inspired by Pythia. It has three parts:
- a scaling recipe that maps compute budgets to model configurations,
- a scaling suite of models trained from that recipe at IsoFLOP budgets from 3 × 10¹⁸ to 1 × 10²³ FLOPs, and
- a scaling law which uses the smaller Delphi models to predict the larger ones.
A pre-registered forecast from that scaling law predicted the final loss of the largest Delphi run (1 × 10²³ FLOPs, 25 B parameters, 600 B tokens) within 0.2%, using 300× less compute than the training run itself. The same process forecasts downstream benchmarks — MMLU, HumanEval, and GSM8K — via a two-step regression combining compute and observational scaling laws.
See "Scaling Laws That Extrapolate 300× Past the Fit"
for the recipe, fit, and downstream-eval projections. The full set of Delphi
checkpoints — IsoFLOP grid points, held-out optima at 1e21/1e22/1e23 with
multiple random seeds, and training intermediates — lives on
marin-community on the Hub.
This is a research artifact, not a production model.
Model details
| Architecture | Qwen 3 (pre-norm decoder, RMSNorm, RoPE, QK-norm with learned scaling, SwiGLU MLPs) |
| Parameters | 669,160,448 |
| Hidden size | 1280 |
| Layers | 13 |
| Attention heads | 10 |
| KV heads | 10 (no GQA) |
| Head dim | 128 |
| FFN intermediate | 5120 (MLP ratio 4) |
| Vocab size | 128,256 (Llama 3 tokenizer) |
| Max sequence length | 4096 |
| Position encoding | RoPE (θ = 500000, Llama 3-style scaling) |
| Bias terms | None |
| Tied embeddings | No |
Training
| Compute | 9 × 10¹⁸ FLOPs |
| Tokens | 2,336,227,328 |
| Steps | 35,647 |
| Sequence length | 4096 |
| Optimizer | AdamH (Adam with Hyperball) |
| Recipe | Delphi (Complete(d)P-style scaling with (T₀/T)^0.3 token-horizon LR correction) |
| LR schedule | WSD: 10% linear warmup, 20% linear decay, 0 floor |
| Precision | f32 master params, bf16 compute |
| Parallelism | FSDP |
| Data mixture | Nemotron-CC + StarCoderData + ProofPile 2 |
| Tokenizer | Llama 3 (vocab 128,256) |
AdamH, Adam
with Hyperball, constrains every projection weight to stay on the Frobenius-
norm sphere it was initialized on, so weight decay has nothing to regularize
away and falls out of the recipe. A Complete(d)P-style transfer rule with a
(T₀/T)^0.3 correction sets learning rate as token horizon grows. Reference
constants: B₀ = 64, T₀ = 2.5 B tokens, η₀ = 0.00630, η₀,Adam = 0.000656,
ε₀ = 1.85 × 10⁻⁸. Recipe code:
experiments/scaling_law_sweeps/completed_adamh.py.
Companion releases
- All Delphi model checkpoints:
marin-communityon the Hub. - Plot data behind every figure in the blog post:
marin-community/delphi-blog-data(one config per figure, withwandb_urlon every row). - Pipelines that deterministically reproduce the training mixture from public Nemotron-CC, StarCoderData, and ProofPile 2 sources: see the Marin repo.
Evaluation
This checkpoint is part of the Delphi eval suite
(experiments/exp1337_eval_suite.py),
which scores every Delphi run alongside reference open-weights baselines
(Qwen 3, Llama 2/3, OLMo 2, Marin 8B). Following the blog's two-step
forecast, soft metrics (per-choice log-prob for multiple-choice tasks,
bits-per-byte for generative tasks) carry the signal the scaling law is fit on,
and a sigmoid fit on an external model pool maps soft metric to hard metric
(accuracy, pass@1, exact-match). Below ~1e21 FLOPs the hard metrics stay near
chance even when the underlying probabilities are improving smoothly; that is
expected and is exactly why the soft metrics exist.
Limitations
- Trained on an English-heavy web mixture; no multilingual coverage.
- Pretrained-only — no instruction tuning, RLHF, or safety alignment.
- The Delphi recipe targets compute-optimal training, not inference-cost-aware overtraining; for inference-heavy deployments, an overtrained smaller model may be preferable. The blog's "off-optimal training" section quantifies the penalty.
- This is one checkpoint in a much larger Delphi release; pick the one that
matches your compute / parameter / token regime, or browse the full set at
marin-community.
Citation
@misc{held2026delphi,
title = {Scaling Laws That Extrapolate 300× Past the Fit},
author = {Held, Will and {Marin Community}},
year = {2026},
url = {https://openathena.ai/blog/delphi}
}
- Downloads last month
- 305