Update README.md
Browse files
README.md
CHANGED
|
@@ -2,6 +2,9 @@
|
|
| 2 |
base_model: meta-llama/Llama-3.2-1B-Instruct
|
| 3 |
datasets:
|
| 4 |
- JunxiongWang/sftdatasetv3
|
|
|
|
|
|
|
|
|
|
| 5 |
model-index:
|
| 6 |
- name: X-EcoMLA-1B1B-dynamic-0.95-DPO
|
| 7 |
results: []
|
|
@@ -34,6 +37,16 @@ The X-EcoMLA models are not trained from scratch. Instead, they are composed fro
|
|
| 34 |
| 3. SFT | End-to-End Knowledge Distillation | The initialized model is fine-tuned via knowledge distillation. |
|
| 35 |
| 4. Alignment | Direct Preference Optimization (DPO) | In the final stage, DPO is used to align the model's preferences, with the distilled student model itself serving as the reference model for stability. |
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
## Getting Started
|
| 39 |
|
|
|
|
| 2 |
base_model: meta-llama/Llama-3.2-1B-Instruct
|
| 3 |
datasets:
|
| 4 |
- JunxiongWang/sftdatasetv3
|
| 5 |
+
- HuggingFaceH4/ultrafeedback_binarized
|
| 6 |
+
- HuggingFaceH4/orca_dpo_pairs
|
| 7 |
+
- JunxiongWang/llama3-ultrafeedback-armorm
|
| 8 |
model-index:
|
| 9 |
- name: X-EcoMLA-1B1B-dynamic-0.95-DPO
|
| 10 |
results: []
|
|
|
|
| 37 |
| 3. SFT | End-to-End Knowledge Distillation | The initialized model is fine-tuned via knowledge distillation. |
|
| 38 |
| 4. Alignment | Direct Preference Optimization (DPO) | In the final stage, DPO is used to align the model's preferences, with the distilled student model itself serving as the reference model for stability. |
|
| 39 |
|
| 40 |
+
## Training Data
|
| 41 |
+
|
| 42 |
+
|Stage | Dataset | License |
|
| 43 |
+
|-----------|---------------------------------------------------------------------------|------------------------|
|
| 44 |
+
| SFT | https://huggingface.co/datasets/teknium/OpenHermes-2.5 | Refer source materials |
|
| 45 |
+
| SFT | https://huggingface.co/datasets/tomg-group-umd/GenQA | CC BY-NC 4.0 |
|
| 46 |
+
| SFT | https://huggingface.co/datasets/BAAI/Infinity-Instruct | CC BY-SA 4.0 |
|
| 47 |
+
| DPO | https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized | MIT |
|
| 48 |
+
| DPO | https://huggingface.co/datasets/HuggingFaceH4/orca_dpo_pairs | MIT |
|
| 49 |
+
| DPO | https://huggingface.co/datasets/JunxiongWang/llama3-ultrafeedback-armorm | MIT |
|
| 50 |
|
| 51 |
## Getting Started
|
| 52 |
|