Reinforcement Learning
Transformers
Safetensors
llama
text-generation
theorem-proving
lean
math
grpo
text-generation-inference
Instructions to use formalmathatepfl/deepseek-prover-v2-grpo-800 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use formalmathatepfl/deepseek-prover-v2-grpo-800 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("formalmathatepfl/deepseek-prover-v2-grpo-800") model = AutoModelForCausalLM.from_pretrained("formalmathatepfl/deepseek-prover-v2-grpo-800") - Notebooks
- Google Colab
- Kaggle
deepseek-prover-v2-grpo-800
GRPO fine-tuned checkpoint at training step 800, converted from the actor FSDP checkpoint to standard Hugging Face Transformers format.
Base model: formalmathatepfl/deepseek-prover-v2-cpt-sft-1e
Details
- Architecture:
LlamaForCausalLM - Parameters: 6.91B
- Precision: bfloat16
- Context length: 65,536
- Training recipe: classic GRPO for theorem proving
Loading
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "formalmathatepfl/deepseek-prover-v2-grpo-800"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype="auto")
- Downloads last month
- 29
Model tree for formalmathatepfl/deepseek-prover-v2-grpo-800
Base model
deepseek-ai/DeepSeek-Prover-V2-7B Finetuned
formalmathatepfl/deepseek-prover-v2-cpt