DPO Fine-tuned Qwen2.5-7B-Instruct

This is a PEFT LoRA adapter for Qwen2.5-7B-Instruct fine-tuned using Direct Preference Optimization (DPO).

Adapter Configuration

  • Type: LoRA
  • Target Model: Qwen/Qwen2.5-7B-Instruct
  • LoRA Rank (r): 16
  • LoRA Alpha: 32
  • LoRA Dropout: 0.05
  • Target Modules: q_proj, v_proj, k_proj, o_proj

Training Details

  • Algorithm: Direct Preference Optimization (DPO)
  • Dataset: Custom preference dataset from LIMA
  • Training Samples: 100
  • Epochs: 1

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)

model = PeftModel.from_pretrained(
    base_model,
    "your-username/qwen-dpo-adapter"
)

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

# Generate
messages = [{"role": "user", "content": "Your prompt here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

Model Performance

The fine-tuned model shows improvements in response quality through optimized preferences.


Generated for Assignment 4 - AI Model Fine-tuning

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shayfeng/qwen-dpo-adapter

Base model

Qwen/Qwen2.5-7B
Adapter
(2183)
this model