amd
/

Zebra-Llama-3B-14MLA-14Mamba-DPO

alignment-handbook

Generated from Trainer

Model card Files Files and versions

Mingyuyang-1 commited on Jun 25

Commit

ac125bf

·

1 Parent(s): 2ca8016

Update README.md

Files changed (1) hide show

README.md +17 -2

README.md CHANGED Viewed

@@ -1,8 +1,10 @@
 ---
 base_model:
-- meta-llama/Llama-3.2-3B-Instruct
 datasets:
-- JunxiongWang/sftdatasetv3
 model-index:
 - name: Zebra-Llama-3B-14MLA-14Mamba-DPO
   results: []
@@ -43,6 +45,19 @@ The Zebra-Llama models are not trained from scratch. Instead, they are composed
 | 5. SFT            | End-to-End Knowledge Distillation     | The composed hybrid model is fine-tuned via knowledge distillation, using an 8B model as a teacher to transfer rich, pre-trained knowledge.                                                  |
 | 6. Alignment      | Direct Preference Optimization (DPO)  | In the final stage, DPO is used to align the model's preferences, with the distilled student model itself serving as the reference model for stability.                                      |
 ## Getting Started
 ### Installation

 ---
 base_model:
+- amd/Zebra-Llama-3B-14MLA-14Mamba-SFT
 datasets:
+- HuggingFaceH4/ultrafeedback_binarized
+- HuggingFaceH4/orca_dpo_pairs
+- JunxiongWang/llama3-ultrafeedback-armorm
 model-index:
 - name: Zebra-Llama-3B-14MLA-14Mamba-DPO
   results: []
 | 5. SFT            | End-to-End Knowledge Distillation     | The composed hybrid model is fine-tuned via knowledge distillation, using an 8B model as a teacher to transfer rich, pre-trained knowledge.                                                  |
 | 6. Alignment      | Direct Preference Optimization (DPO)  | In the final stage, DPO is used to align the model's preferences, with the distilled student model itself serving as the reference model for stability.                                      |
+## Training Data
+|Stage      | Dataset                                                                   | License                |
+|-----------|---------------------------------------------------------------------------|------------------------|
+| ILD/SFT   | https://huggingface.co/datasets/teknium/OpenHermes-2.5                    | Refer source materials |
+| ILD/SFT   | https://huggingface.co/datasets/tomg-group-umd/GenQA                      | CC BY-NC 4.0           |
+| ILD/SFT   | https://huggingface.co/datasets/BAAI/Infinity-Instruct                    | CC BY-SA 4.0           |
+| DPO       | https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized     | MIT                    |
+| DPO       | https://huggingface.co/datasets/HuggingFaceH4/orca_dpo_pairs              | MIT                    |
+| DPO       | https://huggingface.co/datasets/JunxiongWang/llama3-ultrafeedback-armorm  | MIT                    |
 ## Getting Started
 ### Installation