PromptCoT: Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Modelsg

🚀 Overview

The PromptCoT Problem Generation Model is a lightweight yet powerful model for synthesizing high-quality Olympiad-level mathematical problems. It enables the scalable construction of problem sets to facilitate post-training tasks such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). By systematically modeling expert problem design, PromptCoT helps generate logically consistent and intellectually demanding problems at scale.

For more details, refer to our paper on ArXiv: 🔗 PromptCoT: Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Models.

🔥 Quick Start: Using the Model

1️⃣ Install Dependencies

pip install transformers vllm torch accelerate

2️⃣ Load the Model with Hugging Face Transformers

You can use the model for direct inference using Hugging Face’s generate API:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "xl-zhao/PromptCoT-Problem-Generation-Model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")

foundational_concepts = [
    "Ability to apply quantitative reasoning and estimation techniques to solve problems, including making approximations and using logical deductions to arrive at a solution.",
    "Ability to solve equations involving complex numbers, including finding conditions under which two complex numbers are equal, particularly in the context of their magnitudes and arguments.",
    "Fractional arithmetic: Performing calculations with fractions to determine the final probability.",
    "Interpreting and solving problems involving nested operations or functions.",
    "Using logical reasoning to connect given data points and derive conclusions."
]

difficulty_level = "HMMT-Feb"

prompt = (
    "Given foundational concepts and difficulty level, identify connections and develop a question "
    "that integrates these concepts with appropriate complexity.\n\n"
    "Foundational Concepts:\n"
    + "\n".join(f"{i+1}. {concept}" for i, concept in enumerate(foundational_concepts))
    + f"\n\nDifficulty Level: {difficulty_level}"
)

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with torch.no_grad():
    output = model.generate(**inputs, max_length=4096, temperature=0.6)

generated_problem = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_problem)

⚡ Using vLLM for Fast Inference

For optimized inference, use vLLM:

from vllm import LLM, SamplingParams

model_name = "xl-zhao/PromptCoT-Problem-Generation-Model"
llm = LLM(model=model_name, tensor_parallel_size=1)

foundational_concepts = [
    "Ability to apply quantitative reasoning and estimation techniques to solve problems, including making approximations and using logical deductions to arrive at a solution.",
    "Ability to solve equations involving complex numbers, including finding conditions under which two complex numbers are equal, particularly in the context of their magnitudes and arguments.",
    "Fractional arithmetic: Performing calculations with fractions to determine the final probability.",
    "Interpreting and solving problems involving nested operations or functions.",
    "Using logical reasoning to connect given data points and derive conclusions."
]

difficulty_level = "HMMT-Feb"

prompt = (
    "Given foundational concepts and difficulty level, identify connections and develop a question "
    "that integrates these concepts with appropriate complexity.\n\n"
    "Foundational Concepts:\n"
    + "\n".join(f"{i+1}. {concept}" for i, concept in enumerate(foundational_concepts))
    + f"\n\nDifficulty Level: {difficulty_level}"
)

sampling_params = SamplingParams(temperature=0.6, max_tokens=4096)
outputs = llm.generate([prompt], sampling_params)

print(outputs[0].outputs[0].text)

🔗 Full Usage & Advanced Options

For advanced usage, including batch inference and rejection sampling for filtering high-quality problems, refer to the full repository on GitHub:
🔹 GitHub: PromptCoT

📜 Citation

If you use PromptCoT, please consider citing:

@article{zhao2025promptcot,
  author    = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Kong, Lingpeng},
  title     = {PromptCoT: Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Models},
  year      = {2025},
  journal   = {arXiv preprint arXiv:2503.02324},
  url       = {http://arxiv.org/abs/2503.02324}
}

xl-zhao
/

PromptCoT-Problem-Generation-Model

PromptCoT: Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Modelsg

🚀 Overview

🔥 Quick Start: Using the Model

1️⃣ Install Dependencies

2️⃣ Load the Model with Hugging Face Transformers

⚡ Using vLLM for Fast Inference

🔗 Full Usage & Advanced Options

📜 Citation

Model tree for xl-zhao/PromptCoT-Problem-Generation-Model

Dataset used to train xl-zhao/PromptCoT-Problem-Generation-Model

Collection including xl-zhao/PromptCoT-Problem-Generation-Model

PromptCoT