Instructions to use MorphMind-AI/CFM-Proof-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MorphMind-AI/CFM-Proof-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MorphMind-AI/CFM-Proof-7B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("MorphMind-AI/CFM-Proof-7B") model = AutoModelForMultimodalLM.from_pretrained("MorphMind-AI/CFM-Proof-7B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use MorphMind-AI/CFM-Proof-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MorphMind-AI/CFM-Proof-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MorphMind-AI/CFM-Proof-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/MorphMind-AI/CFM-Proof-7B
- SGLang
How to use MorphMind-AI/CFM-Proof-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MorphMind-AI/CFM-Proof-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MorphMind-AI/CFM-Proof-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MorphMind-AI/CFM-Proof-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MorphMind-AI/CFM-Proof-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use MorphMind-AI/CFM-Proof-7B with Docker Model Runner:
docker model run hf.co/MorphMind-AI/CFM-Proof-7B
CFM-Proof-7B · MorphMind
A control model that reads a mathematical proof and tells you where it breaks. Give CFM-Proof-7B a theorem and its proof and it returns a structured verdict — support or refute — pinpoints the offending step, and explains why. It is built as a high-recall reviewer: it surfaces nearly every questionable step so a human misses almost nothing.
CFM-Proof-7B is the flagship of MorphMind's Control Foundation Model (CFM) line — models whose job is not to generate science but to check it. It is a full-parameter fine-tune that scales up CFM-Proof-3B, lifting recall from 0.83 to 0.95.
By MorphMind. Research preview.
Benchmark — proof-error recall vs. frontier models
Recall (share of injected proof errors caught) on the same 150-proof held-out sample — every model given JSON output and an adequate token budget, so the comparison is like-for-like:
| Model | Recall (errors caught) | Size |
|---|---|---|
| base Qwen (untuned) | 0.04 | — |
| Claude Opus 4.8 | 0.61 | frontier |
| GPT-5.4 | 0.84 | frontier |
| CFM-Proof-3B | 0.88 | 3B |
| CFM-Proof-7B (ours) | 0.96 | 7B · single GPU |
On the full 1,977-proof test and an entirely held-out domain, CFM-Proof-7B's recall is 0.95 / 0.95 — it catches 95% of injected errors while running on a single GPU, at a fraction of the cost of frontier APIs. Read the table as a recall screen, not a verdict on overall capability: the models sit at different precision/recall trade-offs — Opus is more conservative (higher precision, lower recall), while CFM-Proof-7B favors recall, the right bias for a first-pass screen that must not miss errors.
When & how to use it
Use CFM-Proof-7B as a fast first-pass reviewer — to catch slips before a human deep-read, to triage a stack of submissions, or to vet AI-generated proofs. It is most valuable wherever a missed error is expensive: refereeing, internal review, grading, automated theorem generation.
The unit of review is one claim + its proof — not a whole paper. For a long paper, screen it piece by piece:
- Split the paper into its theorem / lemma / proposition blocks, each with its proof.
- Run CFM-Proof-7B on each block independently.
- Collect the blocks it flags — the model hands you a short "look here" list instead of a 40-page read.
This keeps every input short (one proof, the form it was trained on) and scales cleanly to long papers and large batches. Because it is tuned for recall, treat its flags as "worth a human's 30 seconds" — it is a screen, not a final judge.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("MorphMind-AI/CFM-Proof-7B")
model = AutoModelForCausalLM.from_pretrained("MorphMind-AI/CFM-Proof-7B",
torch_dtype=torch.bfloat16, device_map="auto")
SYSTEM = ("You are a scientific correctness reviewer. Review the theorem and proof and respond ONLY "
"with JSON: {\"analysis\":...,\"verdict\":\"support|refute\","
"\"error_spans\":[{\"text\":...,\"why\":...}],\"action\":\"accept|suggest_edit\"}")
def review(theorem, proof):
msgs=[{"role":"system","content":SYSTEM},
{"role":"user","content":f"THEOREM:\n{theorem}\n\nPROOF:\n{proof}"}]
ids=tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out=model.generate(ids, max_new_tokens=320, do_sample=False)
return tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True)
# For a long paper: for theorem, proof in split_into_proof_blocks(paper): review(theorem, proof)
How it was built
A full-parameter fine-tune of Qwen2.5-7B-Instruct, trained with RLVR — Reinforcement Learning from Verifiable Rewards: the model proposes a verdict, an automatic checker validates it against ground truth, and only verifiably-correct answers are reinforced. No model-as-judge. Trained on public arXiv LaTeX proofs across statistics, probability, optimization, CS-theory, and ML theory.
Limitations
CFM-Proof-7B is a recall-first screen, and is deliberately built that way:
- It over-flags (precision ≈ 0.5) — by design. It is far cheaper to dismiss a false alarm in seconds than to ship a missed error, so it errs toward flagging. Keep a human in the loop.
- It catches ≈95% of errors, not 100% — a strong screen, not a proof of correctness.
- It localizes the exact step ≈10% of the time; otherwise it tells you the proof is suspect and why, and you scan.
- It was trained on representative injected errors (reversed inequalities, sign flips, altered constants); coverage of every real-world mistake will keep improving with each release.
- This is a research preview; a multi-domain CFM-7B (adding methodology-conformance and novelty checks) is in training.
License
Released under the MorphMind CFM Research License (see LICENSE). The underlying Qwen2.5-7B base is
Apache-2.0; this fine-tune is distributed for research / non-commercial use, with attribution to
MorphMind and Qwen. For commercial licensing, contact MorphMind (morphmind.ai).
Citation
MorphMind. CFM-Proof-7B: a control foundation model for scientific-proof correctness. 2026.
- Downloads last month
- -
