Ejafa commited on
Commit
3a859d7
1 Parent(s): 27c750e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -18,7 +18,12 @@ model-index:
18
 
19
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
20
  should probably proofread and complete it, then remove this comment. -->
 
 
21
 
 
 
 
22
  # qwen2-0.5b-instruct-simpo-lr-5e-07-gamma-1.5
23
 
24
  This model is a fine-tuned version of [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on the princeton-nlp/llama3-ultrafeedback dataset.
 
18
 
19
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
20
  should probably proofread and complete it, then remove this comment. -->
21
+ ## Description
22
+ This model was trained as part of the Reinforcement Learning - 24 project at Peking University, focusing on [simpo].
23
 
24
+ ## Authors
25
+ - Ejafa Bassam
26
+ - Yaroslav Ponomarenko
27
  # qwen2-0.5b-instruct-simpo-lr-5e-07-gamma-1.5
28
 
29
  This model is a fine-tuned version of [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on the princeton-nlp/llama3-ultrafeedback dataset.