qgallouedec HF staff commited on
Commit
4844b4a
1 Parent(s): 73be738

End of training

Browse files
Files changed (2) hide show
  1. README.md +15 -0
  2. generation_config.json +14 -0
README.md ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2-0.5B-Instruct
3
+ datasets: dataset_name
4
+ library_name: transformers
5
+ model_name: online-dpo-qwen2-4
6
+ tags:
7
+ - trl
8
+ - online-dpo
9
+ - generated_from_trainer
10
+ licence: license
11
+ ---
12
+
13
+ # Model Card for online-dpo-qwen2-4
14
+
15
+ This model is a fine-tuned version of [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on the https://huggingface.co/datasets/trl-lib/ultrafeedback-prompt dataset.
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "repetition_penalty": 1.1,
10
+ "temperature": 0.7,
11
+ "top_k": 20,
12
+ "top_p": 0.8,
13
+ "transformers_version": "4.45.0.dev0"
14
+ }