README.md · Cagatayd/Llama3.2-1B-Instruct-DPO-16bit at main

metadata

library_name: transformers
tags: []

Model Description

I fine-tuned the new Llama-3.2-1B-Instruct model using both the Anthropic HH-RLHF and Magpie-Pro-DPO datasets with Direct Preference Optimization (DPO). I formatted the datasets according to DPO requirements and structured them into a “ready-to-apply chat template.”

Additionally, I modified the model’s tokenizer chat template to save tokens by removing the automatic system date messages, such as “Cutting Knowledge Date: December 2023,” from the system messages.