base_model: base_model | |
datasets: dataset_name | |
library_name: transformers | |
model_name: online-dpo-qwen2-2 | |
tags: | |
- trl | |
- online-dpo | |
- generated_from_trainer | |
- peft | |
licence: license | |
# Model Card for online-dpo-qwen2-2 | |
This model is a fine-tuned version of [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on the https://huggingface.co/datasets/trl-lib/ultrafeedback-prompt dataset. |