Instructions to use MorphMind-AI/CFM-Proof-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MorphMind-AI/CFM-Proof-3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MorphMind-AI/CFM-Proof-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("MorphMind-AI/CFM-Proof-3B") model = AutoModelForMultimodalLM.from_pretrained("MorphMind-AI/CFM-Proof-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use MorphMind-AI/CFM-Proof-3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MorphMind-AI/CFM-Proof-3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MorphMind-AI/CFM-Proof-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/MorphMind-AI/CFM-Proof-3B
- SGLang
How to use MorphMind-AI/CFM-Proof-3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MorphMind-AI/CFM-Proof-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MorphMind-AI/CFM-Proof-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MorphMind-AI/CFM-Proof-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MorphMind-AI/CFM-Proof-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use MorphMind-AI/CFM-Proof-3B with Docker Model Runner:
docker model run hf.co/MorphMind-AI/CFM-Proof-3B
CFM-Proof-3B ยท MorphMind
A control model that reads a mathematical proof and tells you where it breaks. Give CFM-Proof-3B a theorem and its proof and it returns a structured verdict โ support or refute โ pinpoints the offending step, and explains why. It is built as a high-recall reviewer: it surfaces nearly every questionable step so a human misses almost nothing.
CFM-Proof-3B is the first release in MorphMind's Control Foundation Model (CFM) line โ models whose job is not to generate science but to check it.
By MorphMind. Research preview.
Benchmark โ proof-error recall vs. frontier models
Recall (share of injected proof errors caught) on the same 150-proof held-out sample โ every model given JSON output and an adequate token budget, so the comparison is like-for-like:
| Model | Recall (errors caught) | Size |
|---|---|---|
| base Qwen2.5-3B (zero-shot) | 0.04 | 3B |
| Claude Opus 4.8 | 0.61 | frontier |
| GPT-5.4 | 0.84 | frontier |
| CFM-Proof-3B (ours) | 0.88 | 3B ยท single GPU |
On this held-out sample CFM-Proof-3B is competitive with frontier models on error catch-rate at roughly 1/100 the size, running on a single GPU. On the full 1,977-proof test and an entirely held-out domain, its robust recall is 0.83 / 0.82 (localization 0.30 / 0.28), consistent across fields (cs.CC 0.87 ยท cs.IT 0.84 ยท cs.LG 0.84 ยท math.OC 0.84 ยท stat 0.80). Read the table as a recall screen, not a verdict on overall capability: the models sit at different precision/recall trade-offs โ Opus is more conservative (higher precision, lower recall), while CFM-Proof-3B and GPT-5.4 favor recall, the right bias for a first-pass screen that must not miss errors.
When & how to use it
Use CFM-Proof-3B as a fast first-pass reviewer โ to catch slips before a human deep-read, to triage a stack of submissions, or to vet AI-generated proofs. It is most valuable wherever a missed error is expensive: refereeing, internal review, grading, automated theorem generation.
The unit of review is one claim + its proof โ not a whole paper. For a long paper, screen it piece by piece:
- Split the paper into its theorem / lemma / proposition blocks, each with its proof (a paper has many).
- Run CFM-Proof-3B on each block independently.
- Collect the blocks it flags โ the model hands you a short "look here" list instead of a 40-page read.
This keeps every input short (one proof, the form it was trained on) and scales cleanly to long papers and large batches. Because it is tuned for recall, treat its flags as "worth a human's 30 seconds" โ it is a screen, not a final judge.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("MorphMind-AI/CFM-Proof-3B")
model = AutoModelForCausalLM.from_pretrained("MorphMind-AI/CFM-Proof-3B",
torch_dtype=torch.bfloat16, device_map="auto")
SYSTEM = ("You are a scientific correctness reviewer. Review the theorem and proof and respond ONLY "
"with JSON: {\"analysis\":...,\"verdict\":\"support|refute\","
"\"error_spans\":[{\"text\":...,\"why\":...}],\"action\":\"accept|suggest_edit\"}")
def review(theorem, proof):
msgs=[{"role":"system","content":SYSTEM},
{"role":"user","content":f"THEOREM:\n{theorem}\n\nPROOF:\n{proof}"}]
ids=tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out=model.generate(ids, max_new_tokens=320, do_sample=False)
return tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True)
# For a long paper: for theorem, proof in split_into_proof_blocks(paper): review(theorem, proof)
How it was built
A short supervised warm-start, then RLVR โ Reinforcement Learning from Verifiable Rewards: the model proposes a verdict, an automatic checker validates it against ground truth, and only verifiably-correct answers are reinforced. No model-as-judge. Trained on public arXiv LaTeX proofs across statistics, probability, optimization, CS-theory, and ML theory.
Limitations
CFM-Proof-3B is a recall-first screen, and is deliberately built that way:
- It over-flags (precision โ 0.5) โ by design. It is far cheaper to dismiss a false alarm in seconds than to ship a missed error, so it errs toward flagging. Keep a human in the loop.
- It catches โ83% of errors, not 100% โ a strong screen, not a proof of correctness.
- It localizes the exact step โ30% of the time; otherwise it tells you the proof is suspect and why, and you scan.
- It was trained on representative injected errors (reversed inequalities, sign flips, altered constants); coverage of every real-world mistake will keep improving with each release.
- This is a research preview; a permissively-licensed, larger CFM-Proof-7B is in training.
License
Released under the MorphMind CFM Research License (see LICENSE), which incorporates the
Qwen Research License of the underlying Qwen2.5-3B base. Research / non-commercial use, with
attribution to MorphMind and Qwen. For commercial licensing, contact MorphMind (morphmind.ai).
Citation
MorphMind. CFM-Proof-3B: a control foundation model for scientific-proof correctness. 2026.
- Downloads last month
- 105
