Aratako commited on
Commit
bacc058
1 Parent(s): 7c64d57

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +122 -46
README.md CHANGED
@@ -1,69 +1,145 @@
1
  ---
2
- base_model: Aratako/Llama-Gemma-2-27b-SimPO-trial3
3
  library_name: transformers
4
- model_name: fft-simpo3-iterative-iter1
5
  tags:
6
  - generated_from_trainer
7
  - axolotl
8
  - trl
9
  - cpo
10
- licence: license
 
 
11
  ---
12
 
13
- # Model Card for fft-simpo3-iterative-iter1
14
 
15
- This model is a fine-tuned version of [Aratako/Llama-Gemma-2-27b-SimPO-trial3](https://huggingface.co/Aratako/Llama-Gemma-2-27b-SimPO-trial3).
16
- It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
- ## Quick start
 
19
 
20
- ```python
21
- from transformers import pipeline
22
 
23
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
- generator = pipeline("text-generation", model="Aratako/fft-simpo3-iterative-iter1", device="cuda")
25
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
- print(output["generated_text"])
27
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
- ## Training procedure
 
 
 
30
 
31
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/aratako-lm/27b-fft/runs/g9va2r0s)
 
 
 
 
 
 
32
 
33
- This model was trained with CPO, a method introduced in [Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation](https://huggingface.co/papers/2401.08417).
 
 
 
 
34
 
35
- ### Framework versions
 
 
 
 
 
 
36
 
37
- - TRL: 0.12.0
38
- - Transformers: 4.46.3
39
- - Pytorch: 2.3.1+cu121
40
- - Datasets: 3.1.0
41
- - Tokenizers: 0.20.3
42
 
43
- ## Citations
 
 
 
 
 
 
44
 
45
- Cite CPO as:
 
 
46
 
47
- ```bibtex
48
- @inproceedings{xu2024contrastive,
49
- title = {{Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation}},
50
- author = {Haoran Xu and Amr Sharaf and Yunmo Chen and Weiting Tan and Lingfeng Shen and Benjamin Van Durme and Kenton Murray and Young Jin Kim},
51
- year = 2024,
52
- booktitle = {Forty-first International Conference on Machine Learning, {ICML} 2024, Vienna, Austria, July 21-27, 2024},
53
- publisher = {OpenReview.net},
54
- url = {https://openreview.net/forum?id=51iwkioZpn}
55
- }
 
 
 
56
  ```
57
 
58
- Cite TRL as:
59
-
60
- ```bibtex
61
- @misc{vonwerra2022trl,
62
- title = {{TRL: Transformer Reinforcement Learning}},
63
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
64
- year = 2020,
65
- journal = {GitHub repository},
66
- publisher = {GitHub},
67
- howpublished = {\url{https://github.com/huggingface/trl}}
68
- }
69
- ```
 
1
  ---
2
+ base_model: Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1
3
  library_name: transformers
 
4
  tags:
5
  - generated_from_trainer
6
  - axolotl
7
  - trl
8
  - cpo
9
+ license:
10
+ - llama3.1
11
+ - gemma
12
  ---
13
 
14
+ # Llama-Gemma-2-27b-CPO_SimPO-iter2
15
 
16
+ ## 概要
 
17
 
18
+ [google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)を教師あり学習と[CPO_SimPO](https://github.com/fe1ixxu/CPO_SIMPO)によりInstruction Tuningしたモデルである[Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1](https://huggingface.co/Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1)に対して、
19
+ 2回目のCPO_SimPOを適用したモデルです。
20
 
21
+ [松尾研大規模言語モデル講座2024](https://weblab.t.u-tokyo.ac.jp/lecture/course-list/large-language-model/)のコンペ用の提出モデル作成の一環として作成・公開しています。
 
22
 
23
+ This model is built with Llama and Qwen.
24
+
25
+ ## 使用データセット
26
+
27
+ - [iterative-dpo-data-for-SimPO-iter2](https://huggingface.co/datasets/iterative-dpo-data-for-SimPO-iter2)
28
+
29
+ ## ライセンス
30
+
31
+ 本モデルは学習に利用したデータの関係で以下のライセンスの影響を受けます。
32
+
33
+ - [META LLAMA 3.1 COMMUNITY LICENSE](https://www.llama.com/llama3_1/license/)を継承します。
34
+ - [Gemma Terms of Use](https://ai.google.dev/gemma/terms)を継承します。
35
+ - [Qwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE)の影響を受けます。ライセンスは継承しませんが、「Built with Qwen」のような文言を記載する必要があります。
36
+
37
+ ## 学習に関する詳細
38
+
39
+ 本モデルの学習には[axolotl](https://github.com/axolotl-ai-cloud/axolotl)を使いました。パラメータ等の学習の設定は下記の設定ファイルをご確認ください。
40
+
41
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
42
+ <details><summary>See axolotl config</summary>
43
+
44
+ axolotl version: `0.5.2`
45
+ ```yaml
46
+ base_model: Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1
47
+ model_type: AutoModelForCausalLM
48
+ tokenizer_type: AutoTokenizer
49
+
50
+ hub_model_id: Aratako/fft-simpo3-iterative-iter2
51
+ hub_strategy: "end"
52
+ push_dataset_to_hub:
53
+ hf_use_auth_token: true
54
+
55
+ plugins:
56
+ - axolotl.integrations.liger.LigerPlugin
57
+ liger_cross_entropy: false
58
+ liger_rope: true
59
+ liger_rms_norm: true
60
+ liger_swiglu: true
61
+ liger_fused_linear_cross_entropy: true
62
+
63
+ load_in_8bit: false
64
+ load_in_4bit: false
65
+ strict: false
66
+
67
+ chat_template: tokenizer_default
68
+ rl: simpo
69
+ rl_beta: 10.0
70
+ cpo_alpha: 0.05
71
+ simpo_gamma: 5.0
72
+ max_prompt_length: 512
73
+ max_length: 2048
74
+
75
+
76
+ datasets:
77
+ - path: Aratako/iterative-dpo-data-for-SimPO-iter2
78
+ type: gemma.custom
79
+ train_on_split: train
80
+
81
+
82
+ shuffle_merged_datasets: true
83
+ dataset_prepared_path: /workspace/data/fft-simpo3-iterative-iter2-data
84
+ output_dir: /workspace/data/27b-fft-simpo3-iterative-iter2
85
 
86
+ sequence_len: 2048
87
+ sample_packing: false
88
+ eval_sample_packing: false
89
+ pad_to_sequence_len: true
90
 
91
+ adapter:
92
+ lora_model_dir:
93
+ lora_r:
94
+ lora_alpha:
95
+ lora_dropout:
96
+ lora_target_linear:
97
+ lora_fan_in_fan_out:
98
 
99
+ wandb_project: 27b-fft
100
+ wandb_entity: aratako-lm
101
+ wandb_watch:
102
+ wandb_name: simpo3-iter2
103
+ wandb_log_model:
104
 
105
+ gradient_accumulation_steps: 8
106
+ micro_batch_size: 2
107
+ num_epochs: 1
108
+ optimizer: paged_adamw_8bit
109
+ lr_scheduler: cosine
110
+ cosine_min_lr_ratio: 0.1
111
+ learning_rate: 3e-7
112
 
113
+ train_on_inputs: false
114
+ group_by_length: false
115
+ bf16: auto
116
+ fp16:
117
+ tf32: false
118
 
119
+ gradient_checkpointing: true
120
+ early_stopping_patience:
121
+ auto_resume_from_checkpoints: true
122
+ local_rank:
123
+ logging_steps: 1
124
+ xformers_attention:
125
+ flash_attention: true
126
 
127
+ save_strategy: steps
128
+ save_steps: 100
129
+ save_total_limit: 1
130
 
131
+ warmup_steps: 20
132
+ eval_steps:
133
+ eval_batch_size:
134
+ eval_table_size:
135
+ eval_max_new_tokens:
136
+ debug:
137
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
138
+ weight_decay: 0.01
139
+ fsdp:
140
+ fsdp_config:
141
+ special_tokens:
142
+ pad_token: <pad>
143
  ```
144
 
145
+ </details><br>