Commit
·
ac125bf
1
Parent(s):
2ca8016
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,8 +1,10 @@
|
|
| 1 |
---
|
| 2 |
base_model:
|
| 3 |
-
-
|
| 4 |
datasets:
|
| 5 |
-
-
|
|
|
|
|
|
|
| 6 |
model-index:
|
| 7 |
- name: Zebra-Llama-3B-14MLA-14Mamba-DPO
|
| 8 |
results: []
|
|
@@ -43,6 +45,19 @@ The Zebra-Llama models are not trained from scratch. Instead, they are composed
|
|
| 43 |
| 5. SFT | End-to-End Knowledge Distillation | The composed hybrid model is fine-tuned via knowledge distillation, using an 8B model as a teacher to transfer rich, pre-trained knowledge. |
|
| 44 |
| 6. Alignment | Direct Preference Optimization (DPO) | In the final stage, DPO is used to align the model's preferences, with the distilled student model itself serving as the reference model for stability. |
|
| 45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
## Getting Started
|
| 47 |
|
| 48 |
### Installation
|
|
|
|
| 1 |
---
|
| 2 |
base_model:
|
| 3 |
+
- amd/Zebra-Llama-3B-14MLA-14Mamba-SFT
|
| 4 |
datasets:
|
| 5 |
+
- HuggingFaceH4/ultrafeedback_binarized
|
| 6 |
+
- HuggingFaceH4/orca_dpo_pairs
|
| 7 |
+
- JunxiongWang/llama3-ultrafeedback-armorm
|
| 8 |
model-index:
|
| 9 |
- name: Zebra-Llama-3B-14MLA-14Mamba-DPO
|
| 10 |
results: []
|
|
|
|
| 45 |
| 5. SFT | End-to-End Knowledge Distillation | The composed hybrid model is fine-tuned via knowledge distillation, using an 8B model as a teacher to transfer rich, pre-trained knowledge. |
|
| 46 |
| 6. Alignment | Direct Preference Optimization (DPO) | In the final stage, DPO is used to align the model's preferences, with the distilled student model itself serving as the reference model for stability. |
|
| 47 |
|
| 48 |
+
|
| 49 |
+
## Training Data
|
| 50 |
+
|
| 51 |
+
|Stage | Dataset | License |
|
| 52 |
+
|-----------|---------------------------------------------------------------------------|------------------------|
|
| 53 |
+
| ILD/SFT | https://huggingface.co/datasets/teknium/OpenHermes-2.5 | Refer source materials |
|
| 54 |
+
| ILD/SFT | https://huggingface.co/datasets/tomg-group-umd/GenQA | CC BY-NC 4.0 |
|
| 55 |
+
| ILD/SFT | https://huggingface.co/datasets/BAAI/Infinity-Instruct | CC BY-SA 4.0 |
|
| 56 |
+
| DPO | https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized | MIT |
|
| 57 |
+
| DPO | https://huggingface.co/datasets/HuggingFaceH4/orca_dpo_pairs | MIT |
|
| 58 |
+
| DPO | https://huggingface.co/datasets/JunxiongWang/llama3-ultrafeedback-armorm | MIT |
|
| 59 |
+
|
| 60 |
+
|
| 61 |
## Getting Started
|
| 62 |
|
| 63 |
### Installation
|