SubgoalXL - Subgoal-Based Expert Learning for Theorem Proving

Model Description

SubgoalXL is an advanced model for formal theorem proving, leveraging subgoal-based expert learning to enhance LLMs' formal reasoning capabilities. It addresses the challenges of scarce theorem-proving data and multi-step reasoning by optimizing data efficiency and employing subgoal-level supervision. SubgoalXL iteratively refines formal statement, proof, and subgoal generators, allowing it to extract richer information from limited proofs and generate more precise formal proofs.

Example Usage

Below is an example code for using SubgoalXL with its custom prompt template for generating formal proofs:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("xl-zhao/formal_proof_generator_v1_iter3")
model = AutoModelForCausalLM.from_pretrained("xl-zhao/formal_proof_generator_v1_iter3")

informal_statement = ""  # Your informal statement here
formal_statement = ""    # The formal statement generated by the model will be inserted here

# Prompt option 1
prompt1 = f"Develop formal proofs using Isabelle, leveraging auxiliary tools such as Sledgehammer to enhance the proving process.\n\n### Input\n(*Informal Statement:\n{informal_statement}*)\n{formal_statement}\n\n### Output"

# Prompt option 2
prompt2 = f"Utilize Isabelle for the systematic verification of theorem proofs, employing Sledgehammer as a supplementary tool. Approach the problem in a step-by-step manner.\n\n### Problem\n{informal_statement}\n\n### Isabelle Proof\n{formal_statement}"

# Choose the prompt (both prompts are acceptable)
prompt = prompt1  # You can switch to prompt2 if needed

# Tokenize and generate the formal proof
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.8)

# Decode the output
formal_proof = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(formal_proof)

Intended Use

This model is intended for automated theorem proving, especially in environments that require high levels of mathematical rigor, such as Isabelle. By employing subgoal-based proof generation and expert learning, SubgoalXL is optimized for solving complex reasoning tasks in formal mathematics.

Performance

SubgoalXL sets a new state-of-the-art benchmark in formal theorem proving within the Isabelle environment, achieving an accuracy of 56.1% on the miniF2F dataset, an absolute improvement of 4.9% over previous approaches. The model has successfully solved the following problems:

41 AMC12 problems
9 AIME problems
3 IMO problems

For more information on the model's design and performance, please refer to the paper SubgoalXL: Subgoal-Based Expert Learning for Theorem Proving.

Limitations

While SubgoalXL excels in formal theorem proving, its usage is tailored to this specific domain and may not generalize well to other tasks beyond formal logic and proof generation.

Citation

If you use SubgoalXL in your research, please cite our paper:

@article{zhao2024subgoalxl,
      title = {SubgoalXL: Subgoal-based Expert Learning for Theorem Proving},
      author = {Zhao, Xueliang and Zheng, Lin and Bo, Haige and Hu, Changran and Thakker, Urmish and Kong, Lingpeng},
      journal={arXiv preprint arXiv:2408.11172},
      url={https://arxiv.org/abs/2408.11172}, 
      year = {2024},
}

Contact

For more information, please visit the project's GitHub repository or reach out via email at xlzhao22@connect.hku.hk.

xl-zhao
/

formal_proof_generator_v3_iter3