Aratako
/

Llama-Gemma-2-27b-CPO_SimPO-iter2

@@ -1,69 +1,145 @@
 ---
-base_model: Aratako/Llama-Gemma-2-27b-SimPO-trial3
 library_name: transformers
-model_name: fft-simpo3-iterative-iter1
 tags:
 - generated_from_trainer
 - axolotl
 - trl
 - cpo
-licence: license
 ---
-# Model Card for fft-simpo3-iterative-iter1
-This model is a fine-tuned version of [Aratako/Llama-Gemma-2-27b-SimPO-trial3](https://huggingface.co/Aratako/Llama-Gemma-2-27b-SimPO-trial3).
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
-```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="Aratako/fft-simpo3-iterative-iter1", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
-```
-## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/aratako-lm/27b-fft/runs/g9va2r0s)
-This model was trained with CPO, a method introduced in [Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation](https://huggingface.co/papers/2401.08417).
-### Framework versions
-- TRL: 0.12.0
-- Transformers: 4.46.3
-- Pytorch: 2.3.1+cu121
-- Datasets: 3.1.0
-- Tokenizers: 0.20.3
-## Citations
-Cite CPO as:
-```bibtex
-@inproceedings{xu2024contrastive,
-    title        = {{Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation}},
-    author       = {Haoran Xu and Amr Sharaf and Yunmo Chen and Weiting Tan and Lingfeng Shen and Benjamin Van Durme and Kenton Murray and Young Jin Kim},
-    year         = 2024,
-    booktitle    = {Forty-first International Conference on Machine Learning, {ICML} 2024, Vienna, Austria, July 21-27, 2024},
-    publisher    = {OpenReview.net},
-    url          = {https://openreview.net/forum?id=51iwkioZpn}
-}
 ```
-Cite TRL as:
-```bibtex
-@misc{vonwerra2022trl,
-	title        = {{TRL: Transformer Reinforcement Learning}},
-	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
-	year         = 2020,
-	journal      = {GitHub repository},
-	publisher    = {GitHub},
-	howpublished = {\url{https://github.com/huggingface/trl}}
-}
-```

 ---
+base_model: Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1
 library_name: transformers
 tags:
 - generated_from_trainer
 - axolotl
 - trl
 - cpo
+license:
+  - llama3.1
+  - gemma
 ---
+# Llama-Gemma-2-27b-CPO_SimPO-iter2
+## 概要
+[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)を教師あり学習と[CPO_SimPO](https://github.com/fe1ixxu/CPO_SIMPO)によりInstruction Tuningしたモデルである[Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1](https://huggingface.co/Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1)に対して、
+2回目のCPO_SimPOを適用したモデルです。
+[松尾研大規模言語モデル講座2024](https://weblab.t.u-tokyo.ac.jp/lecture/course-list/large-language-model/)のコンペ用の提出モデル作成の一環として作成・公開しています。
+This model is built with Llama and Qwen.
+## 使用データセット
+- [iterative-dpo-data-for-SimPO-iter2](https://huggingface.co/datasets/iterative-dpo-data-for-SimPO-iter2)
+## ライセンス
+本モデルは学習に利用したデータの関係で以下のライセンスの影響を受けます。
+- [META LLAMA 3.1 COMMUNITY LICENSE](https://www.llama.com/llama3_1/license/)を継承します。
+- [Gemma Terms of Use](https://ai.google.dev/gemma/terms)を継承します。
+- [Qwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE)の影響を受けます。ライセンスは継承しませんが、「Built with Qwen」のような文言を記載する必要があります。
+## 学習に関する詳細
+本モデルの学習には[axolotl](https://github.com/axolotl-ai-cloud/axolotl)を使いました。パラメータ等の学習の設定は下記の設定ファイルをご確認ください。
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.5.2`
+```yaml
+base_model: Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1
+model_type: AutoModelForCausalLM
+tokenizer_type: AutoTokenizer
+hub_model_id: Aratako/fft-simpo3-iterative-iter2
+hub_strategy: "end"
+push_dataset_to_hub:
+hf_use_auth_token: true
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+liger_cross_entropy: false
+liger_rope: true
+liger_rms_norm: true
+liger_swiglu: true
+liger_fused_linear_cross_entropy: true
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+chat_template: tokenizer_default
+rl: simpo
+rl_beta: 10.0
+cpo_alpha: 0.05
+simpo_gamma: 5.0
+max_prompt_length: 512
+max_length: 2048
+datasets:
+  - path: Aratako/iterative-dpo-data-for-SimPO-iter2
+    type: gemma.custom
+    train_on_split: train
+shuffle_merged_datasets: true
+dataset_prepared_path: /workspace/data/fft-simpo3-iterative-iter2-data
+output_dir: /workspace/data/27b-fft-simpo3-iterative-iter2
+sequence_len: 2048
+sample_packing: false
+eval_sample_packing: false
+pad_to_sequence_len: true
+adapter:
+lora_model_dir:
+lora_r:
+lora_alpha:
+lora_dropout:
+lora_target_linear:
+lora_fan_in_fan_out:
+wandb_project: 27b-fft
+wandb_entity: aratako-lm
+wandb_watch:
+wandb_name: simpo3-iter2
+wandb_log_model:
+gradient_accumulation_steps: 8
+micro_batch_size: 2
+num_epochs: 1
+optimizer: paged_adamw_8bit
+lr_scheduler: cosine
+cosine_min_lr_ratio: 0.1
+learning_rate: 3e-7
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+early_stopping_patience:
+auto_resume_from_checkpoints: true
+local_rank:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+save_strategy: steps
+save_steps: 100
+save_total_limit: 1
+warmup_steps: 20
+eval_steps:
+eval_batch_size:
+eval_table_size:
+eval_max_new_tokens:
+debug:
+deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
+weight_decay: 0.01
+fsdp:
+fsdp_config:
+special_tokens:
+  pad_token: <pad>
 ```
+</details><br>