Kimina-Prover-7B — RRMA Lean4 SFT Adapter

LoRA adapter fine-tuned on 336 successful Lean 4 proof search traces from the RRMA multi-agent harness.

Base model

AI-MO/Kimina-Prover-Preview-Distill-7B

Training

Dataset: vincentoh/rrma-lean4-agent-traces (reward=1.0 only)
Method: 4-bit QLoRA, r=16, alpha=32
Epochs: 3, max_length: 2048
Final loss: 0.045, token accuracy: 99.1%
Hardware: RTX 4070 Ti SUPER 16GB, ~35 min

What it learned

The edit→compile→diagnose→fix loop for Lean 4 proof search on Erdős #741(ii) and #125, all traces verified by hard Lean compiler reward (0/1).

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
base = AutoModelForCausalLM.from_pretrained(
    "AI-MO/Kimina-Prover-Preview-Distill-7B",
    quantization_config=bnb, device_map="auto"
)
model = PeftModel.from_pretrained(base, "vincentoh/kimina-prover-rrma-sft")
tokenizer = AutoTokenizer.from_pretrained("vincentoh/kimina-prover-rrma-sft")

Downloads last month: 15

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vincentoh/kimina-prover-rrma-sft

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Math-7B

Finetuned

Qwen/Qwen2.5-Math-7B-Instruct

Finetuned

AI-MO/Kimina-Prover-Preview-Distill-7B

Adapter

(1)

this model