Instructions to use shayfeng/qwen-dpo-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use shayfeng/qwen-dpo-adapter with PEFT:
Base model is not found.
- Notebooks
- Google Colab
- Kaggle
DPO Fine-tuned Qwen2.5-7B-Instruct
This is a PEFT LoRA adapter for Qwen2.5-7B-Instruct fine-tuned using Direct Preference Optimization (DPO).
Adapter Configuration
- Type: LoRA
- Target Model: Qwen/Qwen2.5-7B-Instruct
- LoRA Rank (r): 16
- LoRA Alpha: 32
- LoRA Dropout: 0.05
- Target Modules: q_proj, v_proj, k_proj, o_proj
Training Details
- Algorithm: Direct Preference Optimization (DPO)
- Dataset: Custom preference dataset from LIMA
- Training Samples: 100
- Epochs: 1
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-7B-Instruct",
torch_dtype=torch.float16,
device_map="auto"
)
model = PeftModel.from_pretrained(
base_model,
"your-username/qwen-dpo-adapter"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
# Generate
messages = [{"role": "user", "content": "Your prompt here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))
Model Performance
The fine-tuned model shows improvements in response quality through optimized preferences.
Generated for Assignment 4 - AI Model Fine-tuning
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support