FINGU-AI commited on
Commit
4f985e9
1 Parent(s): e2b2dbf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -64
README.md CHANGED
@@ -1,73 +1,10 @@
1
  ---
2
- license: other
3
  base_model: Qwen/Qwen1.5-0.5B-Chat
4
  tags:
5
  - trl
6
  - orpo
7
- - generated_from_trainer
8
  model-index:
9
  - name: Qwen-Orpo-v1
10
  results: []
11
  ---
12
-
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
- # Qwen-Orpo-v1
17
-
18
- This model is a fine-tuned version of [Qwen/Qwen1.5-0.5B-Chat](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat) on an unknown dataset.
19
- It achieves the following results on the evaluation set:
20
- - Loss: 1.8005
21
- - Rewards/chosen: -0.1607
22
- - Rewards/rejected: -0.2097
23
- - Rewards/accuracies: 0.8936
24
- - Rewards/margins: 0.0490
25
- - Logps/rejected: -2.0968
26
- - Logps/chosen: -1.6067
27
- - Logits/rejected: -1.4714
28
- - Logits/chosen: -1.5480
29
- - Nll Loss: 1.7529
30
- - Log Odds Ratio: -0.4757
31
- - Log Odds Chosen: 0.5760
32
-
33
- ## Model description
34
-
35
- More information needed
36
-
37
- ## Intended uses & limitations
38
-
39
- More information needed
40
-
41
- ## Training and evaluation data
42
-
43
- More information needed
44
-
45
- ## Training procedure
46
-
47
- ### Training hyperparameters
48
-
49
- The following hyperparameters were used during training:
50
- - learning_rate: 8e-06
51
- - train_batch_size: 2
52
- - eval_batch_size: 2
53
- - seed: 42
54
- - gradient_accumulation_steps: 4
55
- - total_train_batch_size: 8
56
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
- - lr_scheduler_type: linear
58
- - lr_scheduler_warmup_steps: 10
59
- - num_epochs: 1
60
-
61
- ### Training results
62
-
63
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
64
- |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
65
- | 1.9413 | 0.5001 | 1744 | 1.8005 | -0.1607 | -0.2097 | 0.8936 | 0.0490 | -2.0968 | -1.6067 | -1.4714 | -1.5480 | 1.7529 | -0.4757 | 0.5760 |
66
-
67
-
68
- ### Framework versions
69
-
70
- - Transformers 4.40.2
71
- - Pytorch 2.2.0+cu121
72
- - Datasets 2.19.1
73
- - Tokenizers 0.19.1
 
1
  ---
2
+ license: apache-2.0
3
  base_model: Qwen/Qwen1.5-0.5B-Chat
4
  tags:
5
  - trl
6
  - orpo
 
7
  model-index:
8
  - name: Qwen-Orpo-v1
9
  results: []
10
  ---