Instructions to use dots123/qwen-7b-theorem-engine-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use dots123/qwen-7b-theorem-engine-v3 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = PeftModel.from_pretrained(base_model, "dots123/qwen-7b-theorem-engine-v3") - Notebooks
- Google Colab
- Kaggle
Qwen 7B Theorem Engine v3 - LoRA Adapter
This repository contains the v3 LoRA adapter for a compact mathematical
statement judgment and correction engine fine-tuned from
Qwen/Qwen2.5-7B-Instruct.
Given a mathematical statement, the model decides whether it is CORRECT, FALSE, INCOMPLETE, GARBLED_BUT_IDENTIFIABLE, or UNCLEAR. It then returns a clean theorem-style correction, or reports:
Cannot identify the intended theorem.
The adapter is intended for Windows / Linux NVIDIA inference, merging, and
retraining. For direct Apple Silicon inference, use the MLX 4-bit release:
dots123/qwen-7b-theorem-engine-v3-mlx-q4.
Related Repositories
| Artifact | Repository |
|---|---|
| LoRA adapter | dots123/qwen-7b-theorem-engine-v3 |
| MLX 4-bit model | dots123/qwen-7b-theorem-engine-v3-mlx-q4 |
| Training data | dots123/theorem-engine-v3-training-data |
| Source code | GitHub repository linked from the project release page |
Behavior
| Category | Behavior |
|---|---|
correct |
Keep a correct theorem concise and clean. |
false |
Replace a false statement with a correct theorem or statement. |
missing_condition |
Add the missing hypotheses needed for correctness. |
garbled_identifiable |
Infer the intended theorem and rewrite it fully. |
formal_logic |
Handle formal or domain-specific mathematical statements. |
need_check |
Output the fixed fallback when no theorem can be identified. |
Examples
User: Every continuous function is differentiable.
Assistant: Every differentiable function is continuous.
User: A continuous function on an interval achieves its maximum and minimum.
Assistant: Every continuous function on a closed and bounded interval achieves its maximum and minimum.
User: Pythagorean: a sqrd plus b sqrd is c sqrd in a right triangle.
Assistant: If \(a\) and \(b\) are the lengths of the legs of a right triangle and \(c\) is the length of the hypotenuse, then \(a^2 + b^2 = c^2\).
User: Let thing be valid then A iff B maybe with x.
Assistant: Cannot identify the intended theorem.
Installation
pip install -U transformers peft accelerate bitsandbytes safetensors
Download the base model and this adapter:
hf download Qwen/Qwen2.5-7B-Instruct \
--local-dir models/Qwen2.5-7B-Instruct
hf download dots123/qwen-7b-theorem-engine-v3 \
--local-dir outputs/qwen_7b_v3
Inference
import re
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
BASE = "models/Qwen2.5-7B-Instruct"
ADAPTER = "outputs/qwen_7b_v3"
SYSTEM_PROMPT = """# Role
You are a mathematical statement judgment and correction engine.
# Task
First decide whether the user's mathematical statement is:
CORRECT, FALSE, INCOMPLETE, GARBLED_BUT_IDENTIFIABLE, or UNCLEAR.
# Output rules
1. You MUST first write a concise internal judgment inside a <think>...</think> block.
2. If CORRECT, output the clean mathematical statement only.
3. If FALSE, output the correct theorem or statement.
4. If INCOMPLETE, output the complete rigorous theorem with missing assumptions.
5. If GARBLED_BUT_IDENTIFIABLE, infer the intended theorem and rewrite it fully.
6. If UNCLEAR, output exactly: Cannot identify the intended theorem.
7. Never repeat malformed mathematical text unchanged.
8. Use standard LaTeX formatting for formulas ($...$ or $$...$$).
"""
tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
bnb_4bit_use_double_quant=True,
)
base_model = AutoModelForCausalLM.from_pretrained(
BASE,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "Every continuous function is differentiable."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.3,
top_p=0.85,
repetition_penalty=1.05,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id,
)
text = tokenizer.decode(output_ids[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
answer = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL).strip()
answer = answer.replace("无法识别其意图定理。", "Cannot identify the intended theorem.")
print(answer)
Training Data
The v3 training dataset is public at
dots123/theorem-engine-v3-training-data.
Dataset summary:
- 8,000 records
- English theorem judgment and correction examples
- Categories:
correct,false,missing_condition,garbled_identifiable,formal_logic,need_check - Each record contains
category,instruction,input, andoutput - The target output includes a short
<think>...</think>judgment block followed by the final answer
Category counts:
| Category | Count |
|---|---|
correct |
1,255 |
false |
1,618 |
formal_logic |
1,248 |
garbled_identifiable |
1,865 |
missing_condition |
1,612 |
need_check |
402 |
Training Details
| Item | Value |
|---|---|
| Base model | Qwen/Qwen2.5-7B-Instruct |
| Fine-tuning method | QLoRA |
| Quantization | 4-bit NF4 base model |
| LoRA rank / alpha / dropout | 16 / 32 / 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Optimizer | paged_adamw_32bit |
| Learning rate | 1e-4, cosine schedule |
| Effective batch size | 16 |
| Max sequence length | 1024 |
| Epochs | 3 |
| Optimizer steps | 942 |
| Trainable parameters | 40,370,176, about 0.53% of the base model |
| Hardware | Single RTX 4090, 24 GB VRAM |
| Training time | About 1 hour |
| Final train loss | About 0.062 |
| Final eval loss | About 0.013 |
Evaluation
A small smoke test was run against six canonical prompts:
- Correct theorem: differentiability implies continuity
- False theorem: continuous implies differentiable
- Missing condition: extreme value theorem on an interval
- Garbled identifiable theorem: Pythagorean theorem
- Unclear theorem fallback
- Multi-line algebra statement
All smoke-test cases passed the expected output-pattern checks.
Limitations
- This is a research prototype, not a formal proof assistant.
- Mathematical outputs should be checked before use in teaching, writing, or research.
- The model is tuned for compact, single-statement theorem correction, not long proofs.
- The model may over-normalize casual wording into theorem-style statements.
- The internal
<think>block is a compact training target, not a verified proof trace.
License
No standalone license has been selected for this release yet. The base model is
subject to the license terms of
Qwen/Qwen2.5-7B-Instruct.
Citation
If you use this adapter, please cite the base model and link back to this model repository and the public training dataset.
- Downloads last month
- 37