Update README.md
Browse files
README.md
CHANGED
|
@@ -42,6 +42,17 @@ The Zebra-Llama models are not trained from scratch. Instead, they are composed
|
|
| 42 |
| 5. SFT | End-to-End Knowledge Distillation | The composed hybrid model is fine-tuned via knowledge distillation, using an 8B model as a teacher to transfer rich, pre-trained knowledge. |
|
| 43 |
| 6. Alignment | Direct Preference Optimization (DPO) | In the final stage, DPO is used to align the model's preferences, with the distilled student model itself serving as the reference model for stability. |
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
## Getting Started
|
| 46 |
|
| 47 |
### Installation
|
|
|
|
| 42 |
| 5. SFT | End-to-End Knowledge Distillation | The composed hybrid model is fine-tuned via knowledge distillation, using an 8B model as a teacher to transfer rich, pre-trained knowledge. |
|
| 43 |
| 6. Alignment | Direct Preference Optimization (DPO) | In the final stage, DPO is used to align the model's preferences, with the distilled student model itself serving as the reference model for stability. |
|
| 44 |
|
| 45 |
+
## Training Data
|
| 46 |
+
|
| 47 |
+
|Stage | Dataset | License |
|
| 48 |
+
|-----------|---------------------------------------------------------------------------|------------------------|
|
| 49 |
+
| ILD/SFT | https://huggingface.co/datasets/teknium/OpenHermes-2.5 | Refer source materials |
|
| 50 |
+
| ILD/SFT | https://huggingface.co/datasets/tomg-group-umd/GenQA | CC BY-NC 4.0 |
|
| 51 |
+
| ILD/SFT | https://huggingface.co/datasets/BAAI/Infinity-Instruct | CC BY-SA 4.0 |
|
| 52 |
+
| DPO | https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized | MIT |
|
| 53 |
+
| DPO | https://huggingface.co/datasets/HuggingFaceH4/orca_dpo_pairs | MIT |
|
| 54 |
+
| DPO | https://huggingface.co/datasets/JunxiongWang/llama3-ultrafeedback-armorm | MIT |
|
| 55 |
+
|
| 56 |
## Getting Started
|
| 57 |
|
| 58 |
### Installation
|