Instructions to use MorphMind-AI/CFM-Methods-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MorphMind-AI/CFM-Methods-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MorphMind-AI/CFM-Methods-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("MorphMind-AI/CFM-Methods-3B")
model = AutoModelForMultimodalLM.from_pretrained("MorphMind-AI/CFM-Methods-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use MorphMind-AI/CFM-Methods-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MorphMind-AI/CFM-Methods-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MorphMind-AI/CFM-Methods-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MorphMind-AI/CFM-Methods-3B

SGLang

How to use MorphMind-AI/CFM-Methods-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MorphMind-AI/CFM-Methods-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MorphMind-AI/CFM-Methods-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MorphMind-AI/CFM-Methods-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MorphMind-AI/CFM-Methods-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MorphMind-AI/CFM-Methods-3B with Docker Model Runner:
```
docker model run hf.co/MorphMind-AI/CFM-Methods-3B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

CFM-Methods-3B · MorphMind

A tiny control model that reads a methods section and tells you exactly where the methodology is unsound. Give it a methods or experimental-design block from any empirical-science paper --- statistics, machine learning, quantitative biology, econometrics, materials science, or chemical physics --- and it returns a structured verdict, support or refute, pinpoints the offending statement, and explains why. It is a high-recall screen: it surfaces methodological red flags --- data leakage, p-hacking, uncorrected multiple comparisons, train/test contamination, optional stopping, correlation-as-causation, post-hoc outlier removal, unblinded scoring, and more --- so a human misses almost nothing.

At just 3B parameters, CFM-Methods-3B delivers frontier-level methodology screening that runs on a single GPU, on-premise, at a tiny fraction of the cost of a frontier API. It is the compact member of MorphMind's Control Foundation Model (CFM) line --- models whose job is not to generate science but to check it.

By MorphMind. Research preview.

Benchmark --- methodology-flaw detection vs. frontier models

Evaluated on flaw types the model never trained on (24 flaw families used for training, 12 held out for evaluation), benchmarked head-to-head against frontier commercial models on the same held-out set:

Model	Recall	Precision	Localization	False-positive rate (clean)
base Qwen2.5-3B	0.30	---	0.42	0.07
GPT-4o	0.86	0.64	0.94	0.47
Claude Opus 4	0.96	0.78	0.97	0.28
CFM-Methods-3B (ours)	0.98	1.00	0.97	0.005

CFM-Methods-3B matches frontier recall and localization, with the cleanest false-alarm rate --- effectively zero. It catches 98% of methodological flaws it has never seen and pinpoints the exact flawed statement 97% of the time, on par with Claude Opus 4 and GPT-4o, while the frontier models over-flag clean methods heavily (Opus 28%, GPT-4o 47% false-positive rate). So it delivers frontier-grade methodology screening with the precision of a careful expert --- on-prem, in a 3B model, at a tiny fraction of the cost.

Worked example

1 — it catches an uncorrected multiple-comparisons flaw. Given this methods block:

"We screened 60 candidate protein markers for association with disease status. Each marker was tested individually with a univariate logistic regression at the 0.05 level. The 14 markers reaching p < 0.05 in univariate tests are reported as significant and carried forward as the disease signature."

CFM-Methods-3B returns (verbatim model output):

{
  "analysis": "There is a methodological flaw: Many tests without correction inflate the false-positive rate.",
  "verdict": "refute",
  "error_spans": [
    {
      "text": "The 14 markers reaching p < 0.05 in univariate tests are reported as significant",
      "why": "Many tests without correction inflate the false-positive rate."
    }
  ],
  "action": "suggest_edit"
}

It pinpoints the exact offending sentence and names the failure mode — 60 simultaneous tests at α = 0.05 with no correction.

2 — and it passes clean methods without crying wolf. Given a sound instrumental-variables design:

"We estimate the causal effect of schooling on wages with two-stage least squares, instrumenting years of education with quarter-of-birth. We report first-stage F-statistics to confirm instrument strength and cluster standard errors at the state level."

{
  "analysis": "The methodology is sound: instrument strength is verified and standard errors are clustered appropriately.",
  "verdict": "support",
  "error_spans": [],
  "action": "accept"
}

No false flag — the near-zero false-positive rate in the benchmark above is what this looks like in practice.

When & how to use it

Use it as a fast, private, first-pass methodology screen --- a pre-submission self-check for researchers, triage for journals / reviewers / grant panels, QA over a stack of submissions, or a check on AI-generated experimental designs. Review one methods block at a time (split a paper into its method / experiment / analysis sections and run each). Because it is tuned for recall, treat its flags as "worth a human's 30 seconds."

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("MorphMind-AI/CFM-Methods-3B")
model = AutoModelForCausalLM.from_pretrained("MorphMind-AI/CFM-Methods-3B",
                                             torch_dtype=torch.bfloat16, device_map="auto")
SYS = ("You are a scientific methodology reviewer. Review the methods and respond ONLY with JSON: "
       "{\"analysis\":...,\"verdict\":\"support|refute\","
       "\"error_spans\":[{\"text\":...,\"why\":...}],\"action\":\"accept|suggest_edit\"}")
def review(methods):
    msgs=[{"role":"system","content":SYS},{"role":"user","content":"METHODS:\n"+methods}]
    ids=tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
    out=model.generate(ids, max_new_tokens=320, do_sample=False)
    return tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True)

How it was built

A full-parameter fine-tune of Qwen2.5-3B-Instruct, trained with RLVR (Reinforcement Learning from Verifiable Rewards) under a localization-gated reward --- a verdict is reinforced only if the model also points to the actual flawed statement, which teaches genuine reasoning rather than blanket flagging. Trained on public arXiv methods sections across statistics, machine learning, quantitative biology, econometrics, materials science, and chemical physics, with injected, paraphrased methodological flaws; evaluated on held-out flaw families.

Notes

A high-recall screen for first-pass review: ~98% of flaws surfaced with a near-zero false-alarm rate, designed to keep an expert in the loop for the final call.
Generalizes to methodological flaws it has never seen, across six empirical-science families.
Part of MorphMind's growing Control Foundation Model family.

License

Released under the MorphMind CFM Research License (see LICENSE), incorporating the Qwen Research License of the Qwen2.5-3B base. Research / non-commercial use, with attribution to MorphMind and Qwen. For commercial licensing, contact MorphMind (morphmind.ai).

Downloads last month: 106

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for MorphMind-AI/CFM-Methods-3B

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Finetuned

(1360)

this model

Quantizations

1 model

MorphMind-AI
/

CFM-Methods-3B