Dapinsky/PIT-4B-FT-202212-math-reasoning-dpo

This repo contains a full merged model produced from Dapinsky/PIT-4B-FT-202212-math-reasoning-sft and a PEFT LoRA adapter. It is intended to be loadable with the same Transformers API used for the base Diamegs PIT model:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Dapinsky/PIT-4B-FT-202212-math-reasoning-dpo", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Dapinsky/PIT-4B-FT-202212-math-reasoning-dpo", trust_remote_code=True)

Training Summary

  • Base model: Dapinsky/PIT-4B-FT-202212-math-reasoning-sft
  • Fine-tuning method: LoRA post-training, merged into the base weights for upload.
  • Tokenizer files copied from: base model

LoRA Configuration

{
  "peft_type": "LORA",
  "task_type": "CAUSAL_LM",
  "r": 16,
  "lora_alpha": 32,
  "lora_dropout": 0.05,
  "target_modules": [
    "c_fc",
    "c_proj",
    "c_v",
    "c_k",
    "c_q"
  ],
  "base_model_name_or_path": "Dapinsky/PIT-4B-FT-202212-math-reasoning-sft"
}
Downloads last month
1,357
Safetensors
Model size
4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Dapinsky/PIT-4B-FT-202212-math-reasoning-dpo

Finetuned
(1)
this model