Mingyuyang-1 commited on
Commit
ab1232f
·
verified ·
1 Parent(s): e76e002

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -2,6 +2,9 @@
2
  base_model: meta-llama/Llama-3.2-1B-Instruct
3
  datasets:
4
  - JunxiongWang/sftdatasetv3
 
 
 
5
  model-index:
6
  - name: X-EcoMLA-1B1B-dynamic-0.95-DPO
7
  results: []
@@ -34,6 +37,16 @@ The X-EcoMLA models are not trained from scratch. Instead, they are composed fro
34
  | 3. SFT | End-to-End Knowledge Distillation | The initialized model is fine-tuned via knowledge distillation. |
35
  | 4. Alignment | Direct Preference Optimization (DPO) | In the final stage, DPO is used to align the model's preferences, with the distilled student model itself serving as the reference model for stability. |
36
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ## Getting Started
39
 
 
2
  base_model: meta-llama/Llama-3.2-1B-Instruct
3
  datasets:
4
  - JunxiongWang/sftdatasetv3
5
+ - HuggingFaceH4/ultrafeedback_binarized
6
+ - HuggingFaceH4/orca_dpo_pairs
7
+ - JunxiongWang/llama3-ultrafeedback-armorm
8
  model-index:
9
  - name: X-EcoMLA-1B1B-dynamic-0.95-DPO
10
  results: []
 
37
  | 3. SFT | End-to-End Knowledge Distillation | The initialized model is fine-tuned via knowledge distillation. |
38
  | 4. Alignment | Direct Preference Optimization (DPO) | In the final stage, DPO is used to align the model's preferences, with the distilled student model itself serving as the reference model for stability. |
39
 
40
+ ## Training Data
41
+
42
+ |Stage | Dataset | License |
43
+ |-----------|---------------------------------------------------------------------------|------------------------|
44
+ | SFT | https://huggingface.co/datasets/teknium/OpenHermes-2.5 | Refer source materials |
45
+ | SFT | https://huggingface.co/datasets/tomg-group-umd/GenQA | CC BY-NC 4.0 |
46
+ | SFT | https://huggingface.co/datasets/BAAI/Infinity-Instruct | CC BY-SA 4.0 |
47
+ | DPO | https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized | MIT |
48
+ | DPO | https://huggingface.co/datasets/HuggingFaceH4/orca_dpo_pairs | MIT |
49
+ | DPO | https://huggingface.co/datasets/JunxiongWang/llama3-ultrafeedback-armorm | MIT |
50
 
51
  ## Getting Started
52