Qwen 7B Theorem Engine v3 - LoRA Adapter

This repository contains the v3 LoRA adapter for a compact mathematical statement judgment and correction engine fine-tuned from Qwen/Qwen2.5-7B-Instruct.

Given a mathematical statement, the model decides whether it is CORRECT, FALSE, INCOMPLETE, GARBLED_BUT_IDENTIFIABLE, or UNCLEAR. It then returns a clean theorem-style correction, or reports:

Cannot identify the intended theorem.

The adapter is intended for Windows / Linux NVIDIA inference, merging, and retraining. For direct Apple Silicon inference, use the MLX 4-bit release: dots123/qwen-7b-theorem-engine-v3-mlx-q4.

Related Repositories

Artifact	Repository
LoRA adapter	`dots123/qwen-7b-theorem-engine-v3`
MLX 4-bit model	`dots123/qwen-7b-theorem-engine-v3-mlx-q4`
Training data	`dots123/theorem-engine-v3-training-data`
Source code	GitHub repository linked from the project release page

Behavior

Category	Behavior
`correct`	Keep a correct theorem concise and clean.
`false`	Replace a false statement with a correct theorem or statement.
`missing_condition`	Add the missing hypotheses needed for correctness.
`garbled_identifiable`	Infer the intended theorem and rewrite it fully.
`formal_logic`	Handle formal or domain-specific mathematical statements.
`need_check`	Output the fixed fallback when no theorem can be identified.

Examples

User: Every continuous function is differentiable.
Assistant: Every differentiable function is continuous.

User: A continuous function on an interval achieves its maximum and minimum.
Assistant: Every continuous function on a closed and bounded interval achieves its maximum and minimum.

User: Pythagorean: a sqrd plus b sqrd is c sqrd in a right triangle.
Assistant: If \(a\) and \(b\) are the lengths of the legs of a right triangle and \(c\) is the length of the hypotenuse, then \(a^2 + b^2 = c^2\).

User: Let thing be valid then A iff B maybe with x.
Assistant: Cannot identify the intended theorem.

Installation

pip install -U transformers peft accelerate bitsandbytes safetensors

Download the base model and this adapter:

hf download Qwen/Qwen2.5-7B-Instruct \
  --local-dir models/Qwen2.5-7B-Instruct

hf download dots123/qwen-7b-theorem-engine-v3 \
  --local-dir outputs/qwen_7b_v3

Inference

import re
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

BASE = "models/Qwen2.5-7B-Instruct"
ADAPTER = "outputs/qwen_7b_v3"

SYSTEM_PROMPT = """# Role
You are a mathematical statement judgment and correction engine.

# Task
First decide whether the user's mathematical statement is:
CORRECT, FALSE, INCOMPLETE, GARBLED_BUT_IDENTIFIABLE, or UNCLEAR.

# Output rules
1. You MUST first write a concise internal judgment inside a <think>...</think> block.
2. If CORRECT, output the clean mathematical statement only.
3. If FALSE, output the correct theorem or statement.
4. If INCOMPLETE, output the complete rigorous theorem with missing assumptions.
5. If GARBLED_BUT_IDENTIFIABLE, infer the intended theorem and rewrite it fully.
6. If UNCLEAR, output exactly: Cannot identify the intended theorem.
7. Never repeat malformed mathematical text unchanged.
8. Use standard LaTeX formatting for formulas ($...$ or $$...$$).
"""

tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
    bnb_4bit_use_double_quant=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    BASE,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "Every continuous function is differentiable."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.3,
        top_p=0.85,
        repetition_penalty=1.05,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
    )

text = tokenizer.decode(output_ids[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
answer = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL).strip()
answer = answer.replace("无法识别其意图定理。", "Cannot identify the intended theorem.")
print(answer)

Training Data

The v3 training dataset is public at dots123/theorem-engine-v3-training-data.

Dataset summary:

8,000 records
English theorem judgment and correction examples
Categories: correct, false, missing_condition, garbled_identifiable, formal_logic, need_check
Each record contains category, instruction, input, and output
The target output includes a short <think>...</think> judgment block followed by the final answer

Category counts:

Category	Count
`correct`	1,255
`false`	1,618
`formal_logic`	1,248
`garbled_identifiable`	1,865
`missing_condition`	1,612
`need_check`	402

Training Details

Item	Value
Base model	`Qwen/Qwen2.5-7B-Instruct`
Fine-tuning method	QLoRA
Quantization	4-bit NF4 base model
LoRA rank / alpha / dropout	16 / 32 / 0.05
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Optimizer	`paged_adamw_32bit`
Learning rate	`1e-4`, cosine schedule
Effective batch size	16
Max sequence length	1024
Epochs	3
Optimizer steps	942
Trainable parameters	40,370,176, about 0.53% of the base model
Hardware	Single RTX 4090, 24 GB VRAM
Training time	About 1 hour
Final train loss	About 0.062
Final eval loss	About 0.013

Evaluation

A small smoke test was run against six canonical prompts:

Correct theorem: differentiability implies continuity
False theorem: continuous implies differentiable
Missing condition: extreme value theorem on an interval
Garbled identifiable theorem: Pythagorean theorem
Unclear theorem fallback
Multi-line algebra statement

All smoke-test cases passed the expected output-pattern checks.

Limitations

This is a research prototype, not a formal proof assistant.
Mathematical outputs should be checked before use in teaching, writing, or research.
The model is tuned for compact, single-statement theorem correction, not long proofs.
The model may over-normalize casual wording into theorem-style statements.
The internal <think> block is a compact training target, not a verified proof trace.

License

No standalone license has been selected for this release yet. The base model is subject to the license terms of Qwen/Qwen2.5-7B-Instruct.

Citation

If you use this adapter, please cite the base model and link back to this model repository and the public training dataset.

Downloads last month: 37

Model tree for dots123/qwen-7b-theorem-engine-v3

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2144)

this model

dots123
/

qwen-7b-theorem-engine-v3