Edit model card

dpo-Qwen1.5-0.5B-Chat-alignment-handbook is an instruction-tuned model from Qwen/Qwen1.5-0.5B-Chat. Direct preference optimization (DPO) is used for fine-tuning on HuggingFaceH4/ultrafeedback_binarized.

Limitations of dpo-Qwen1.5-0.5B-Chat-alignment-handbook

  • Generate Inaccurate Code and Facts: The model may produce incorrect code snippets and statements. Users should treat these outputs as suggestions or starting points, not as definitive or accurate solutions.

  • Unreliable Responses to Instruction: The model has not undergone instruction fine-tuning. As a result, it may struggle or fail to adhere to intricate or nuanced instructions provided by users.

Downloads last month
2,334