Qwen 7B Theorem Engine v3 - LoRA Adapter

This repository contains the v3 LoRA adapter for a compact mathematical statement judgment and correction engine fine-tuned from Qwen/Qwen2.5-7B-Instruct.

Given a mathematical statement, the model decides whether it is CORRECT, FALSE, INCOMPLETE, GARBLED_BUT_IDENTIFIABLE, or UNCLEAR. It then returns a clean theorem-style correction, or reports:

Cannot identify the intended theorem.

The adapter is intended for Windows / Linux NVIDIA inference, merging, and retraining. For direct Apple Silicon inference, use the MLX 4-bit release: dots123/qwen-7b-theorem-engine-v3-mlx-q4.

Related Repositories

Artifact Repository
LoRA adapter dots123/qwen-7b-theorem-engine-v3
MLX 4-bit model dots123/qwen-7b-theorem-engine-v3-mlx-q4
Training data dots123/theorem-engine-v3-training-data
Source code GitHub repository linked from the project release page

Behavior

Category Behavior
correct Keep a correct theorem concise and clean.
false Replace a false statement with a correct theorem or statement.
missing_condition Add the missing hypotheses needed for correctness.
garbled_identifiable Infer the intended theorem and rewrite it fully.
formal_logic Handle formal or domain-specific mathematical statements.
need_check Output the fixed fallback when no theorem can be identified.

Examples

User: Every continuous function is differentiable.
Assistant: Every differentiable function is continuous.

User: A continuous function on an interval achieves its maximum and minimum.
Assistant: Every continuous function on a closed and bounded interval achieves its maximum and minimum.

User: Pythagorean: a sqrd plus b sqrd is c sqrd in a right triangle.
Assistant: If \(a\) and \(b\) are the lengths of the legs of a right triangle and \(c\) is the length of the hypotenuse, then \(a^2 + b^2 = c^2\).

User: Let thing be valid then A iff B maybe with x.
Assistant: Cannot identify the intended theorem.

Installation

pip install -U transformers peft accelerate bitsandbytes safetensors

Download the base model and this adapter:

hf download Qwen/Qwen2.5-7B-Instruct \
  --local-dir models/Qwen2.5-7B-Instruct

hf download dots123/qwen-7b-theorem-engine-v3 \
  --local-dir outputs/qwen_7b_v3

Inference

import re
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

BASE = "models/Qwen2.5-7B-Instruct"
ADAPTER = "outputs/qwen_7b_v3"

SYSTEM_PROMPT = """# Role
You are a mathematical statement judgment and correction engine.

# Task
First decide whether the user's mathematical statement is:
CORRECT, FALSE, INCOMPLETE, GARBLED_BUT_IDENTIFIABLE, or UNCLEAR.

# Output rules
1. You MUST first write a concise internal judgment inside a <think>...</think> block.
2. If CORRECT, output the clean mathematical statement only.
3. If FALSE, output the correct theorem or statement.
4. If INCOMPLETE, output the complete rigorous theorem with missing assumptions.
5. If GARBLED_BUT_IDENTIFIABLE, infer the intended theorem and rewrite it fully.
6. If UNCLEAR, output exactly: Cannot identify the intended theorem.
7. Never repeat malformed mathematical text unchanged.
8. Use standard LaTeX formatting for formulas ($...$ or $$...$$).
"""

tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
    bnb_4bit_use_double_quant=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    BASE,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "Every continuous function is differentiable."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.3,
        top_p=0.85,
        repetition_penalty=1.05,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
    )

text = tokenizer.decode(output_ids[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
answer = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL).strip()
answer = answer.replace("无法识别其意图定理。", "Cannot identify the intended theorem.")
print(answer)

Training Data

The v3 training dataset is public at dots123/theorem-engine-v3-training-data.

Dataset summary:

  • 8,000 records
  • English theorem judgment and correction examples
  • Categories: correct, false, missing_condition, garbled_identifiable, formal_logic, need_check
  • Each record contains category, instruction, input, and output
  • The target output includes a short <think>...</think> judgment block followed by the final answer

Category counts:

Category Count
correct 1,255
false 1,618
formal_logic 1,248
garbled_identifiable 1,865
missing_condition 1,612
need_check 402

Training Details

Item Value
Base model Qwen/Qwen2.5-7B-Instruct
Fine-tuning method QLoRA
Quantization 4-bit NF4 base model
LoRA rank / alpha / dropout 16 / 32 / 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Optimizer paged_adamw_32bit
Learning rate 1e-4, cosine schedule
Effective batch size 16
Max sequence length 1024
Epochs 3
Optimizer steps 942
Trainable parameters 40,370,176, about 0.53% of the base model
Hardware Single RTX 4090, 24 GB VRAM
Training time About 1 hour
Final train loss About 0.062
Final eval loss About 0.013

Evaluation

A small smoke test was run against six canonical prompts:

  • Correct theorem: differentiability implies continuity
  • False theorem: continuous implies differentiable
  • Missing condition: extreme value theorem on an interval
  • Garbled identifiable theorem: Pythagorean theorem
  • Unclear theorem fallback
  • Multi-line algebra statement

All smoke-test cases passed the expected output-pattern checks.

Limitations

  • This is a research prototype, not a formal proof assistant.
  • Mathematical outputs should be checked before use in teaching, writing, or research.
  • The model is tuned for compact, single-statement theorem correction, not long proofs.
  • The model may over-normalize casual wording into theorem-style statements.
  • The internal <think> block is a compact training target, not a verified proof trace.

License

No standalone license has been selected for this release yet. The base model is subject to the license terms of Qwen/Qwen2.5-7B-Instruct.

Citation

If you use this adapter, please cite the base model and link back to this model repository and the public training dataset.

Downloads last month
37
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dots123/qwen-7b-theorem-engine-v3

Base model

Qwen/Qwen2.5-7B
Adapter
(2144)
this model

Dataset used to train dots123/qwen-7b-theorem-engine-v3