Instructions to use MorphMind-AI/CFM-Proof-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MorphMind-AI/CFM-Proof-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MorphMind-AI/CFM-Proof-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("MorphMind-AI/CFM-Proof-3B")
model = AutoModelForMultimodalLM.from_pretrained("MorphMind-AI/CFM-Proof-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use MorphMind-AI/CFM-Proof-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MorphMind-AI/CFM-Proof-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MorphMind-AI/CFM-Proof-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MorphMind-AI/CFM-Proof-3B

SGLang

How to use MorphMind-AI/CFM-Proof-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MorphMind-AI/CFM-Proof-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MorphMind-AI/CFM-Proof-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MorphMind-AI/CFM-Proof-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MorphMind-AI/CFM-Proof-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MorphMind-AI/CFM-Proof-3B with Docker Model Runner:
```
docker model run hf.co/MorphMind-AI/CFM-Proof-3B
```

CFM-Proof-3B · MorphMind

A control model that reads a mathematical proof and tells you where it breaks. Give CFM-Proof-3B a theorem and its proof and it returns a structured verdict — support or refute — pinpoints the offending step, and explains why. It is built as a high-recall reviewer: it surfaces nearly every questionable step so a human misses almost nothing.

CFM-Proof-3B is the first release in MorphMind's Control Foundation Model (CFM) line — models whose job is not to generate science but to check it.

By MorphMind. Research preview.

Benchmark — proof-error recall vs. frontier models

Recall (share of injected proof errors caught) on the same 150-proof held-out sample — every model given JSON output and an adequate token budget, so the comparison is like-for-like:

Model	Recall (errors caught)	Size
base Qwen2.5-3B (zero-shot)	0.04	3B
Claude Opus 4.8	0.61	frontier
GPT-5.4	0.84	frontier
CFM-Proof-3B (ours)	0.88	3B · single GPU

On this held-out sample CFM-Proof-3B is competitive with frontier models on error catch-rate at roughly 1/100 the size, running on a single GPU. On the full 1,977-proof test and an entirely held-out domain, its robust recall is 0.83 / 0.82 (localization 0.30 / 0.28), consistent across fields (cs.CC 0.87 · cs.IT 0.84 · cs.LG 0.84 · math.OC 0.84 · stat 0.80). Read the table as a recall screen, not a verdict on overall capability: the models sit at different precision/recall trade-offs — Opus is more conservative (higher precision, lower recall), while CFM-Proof-3B and GPT-5.4 favor recall, the right bias for a first-pass screen that must not miss errors.

When & how to use it

Use CFM-Proof-3B as a fast first-pass reviewer — to catch slips before a human deep-read, to triage a stack of submissions, or to vet AI-generated proofs. It is most valuable wherever a missed error is expensive: refereeing, internal review, grading, automated theorem generation.

The unit of review is one claim + its proof — not a whole paper. For a long paper, screen it piece by piece:

Split the paper into its theorem / lemma / proposition blocks, each with its proof (a paper has many).
Run CFM-Proof-3B on each block independently.
Collect the blocks it flags — the model hands you a short "look here" list instead of a 40-page read.

This keeps every input short (one proof, the form it was trained on) and scales cleanly to long papers and large batches. Because it is tuned for recall, treat its flags as "worth a human's 30 seconds" — it is a screen, not a final judge.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("MorphMind-AI/CFM-Proof-3B")
model = AutoModelForCausalLM.from_pretrained("MorphMind-AI/CFM-Proof-3B",
                                             torch_dtype=torch.bfloat16, device_map="auto")
SYSTEM = ("You are a scientific correctness reviewer. Review the theorem and proof and respond ONLY "
          "with JSON: {\"analysis\":...,\"verdict\":\"support|refute\","
          "\"error_spans\":[{\"text\":...,\"why\":...}],\"action\":\"accept|suggest_edit\"}")

def review(theorem, proof):
    msgs=[{"role":"system","content":SYSTEM},
          {"role":"user","content":f"THEOREM:\n{theorem}\n\nPROOF:\n{proof}"}]
    ids=tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
    out=model.generate(ids, max_new_tokens=320, do_sample=False)
    return tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True)

# For a long paper: for theorem, proof in split_into_proof_blocks(paper): review(theorem, proof)

How it was built

A short supervised warm-start, then RLVR — Reinforcement Learning from Verifiable Rewards: the model proposes a verdict, an automatic checker validates it against ground truth, and only verifiably-correct answers are reinforced. No model-as-judge. Trained on public arXiv LaTeX proofs across statistics, probability, optimization, CS-theory, and ML theory.

Limitations

CFM-Proof-3B is a recall-first screen, and is deliberately built that way:

It over-flags (precision ≈ 0.5) — by design. It is far cheaper to dismiss a false alarm in seconds than to ship a missed error, so it errs toward flagging. Keep a human in the loop.
It catches ≈83% of errors, not 100% — a strong screen, not a proof of correctness.
It localizes the exact step ≈30% of the time; otherwise it tells you the proof is suspect and why, and you scan.
It was trained on representative injected errors (reversed inequalities, sign flips, altered constants); coverage of every real-world mistake will keep improving with each release.
This is a research preview; a permissively-licensed, larger CFM-Proof-7B is in training.

License

Released under the MorphMind CFM Research License (see LICENSE), which incorporates the Qwen Research License of the underlying Qwen2.5-3B base. Research / non-commercial use, with attribution to MorphMind and Qwen. For commercial licensing, contact MorphMind (morphmind.ai).

Citation

MorphMind. CFM-Proof-3B: a control foundation model for scientific-proof correctness. 2026.

Downloads last month: 105

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for MorphMind-AI/CFM-Proof-3B

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Finetuned

(1355)

this model

MorphMind-AI
/

CFM-Proof-3B