deepseek-prover-v2-grpo-800

GRPO fine-tuned checkpoint at training step 800, converted from the actor FSDP checkpoint to standard Hugging Face Transformers format.

Base model: formalmathatepfl/deepseek-prover-v2-cpt-sft-1e

Details

  • Architecture: LlamaForCausalLM
  • Parameters: 6.91B
  • Precision: bfloat16
  • Context length: 65,536
  • Training recipe: classic GRPO for theorem proving

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "formalmathatepfl/deepseek-prover-v2-grpo-800"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype="auto")
Downloads last month
29
Safetensors
Model size
7B params
Tensor type
BF16
·
Video Preview
loading

Model tree for formalmathatepfl/deepseek-prover-v2-grpo-800