amd
/

Safetensors
llama
alignment-handbook
Generated from Trainer
Mingyuyang-1 commited on
Commit
ac125bf
·
1 Parent(s): 2ca8016

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -2
README.md CHANGED
@@ -1,8 +1,10 @@
1
  ---
2
  base_model:
3
- - meta-llama/Llama-3.2-3B-Instruct
4
  datasets:
5
- - JunxiongWang/sftdatasetv3
 
 
6
  model-index:
7
  - name: Zebra-Llama-3B-14MLA-14Mamba-DPO
8
  results: []
@@ -43,6 +45,19 @@ The Zebra-Llama models are not trained from scratch. Instead, they are composed
43
  | 5. SFT | End-to-End Knowledge Distillation | The composed hybrid model is fine-tuned via knowledge distillation, using an 8B model as a teacher to transfer rich, pre-trained knowledge. |
44
  | 6. Alignment | Direct Preference Optimization (DPO) | In the final stage, DPO is used to align the model's preferences, with the distilled student model itself serving as the reference model for stability. |
45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  ## Getting Started
47
 
48
  ### Installation
 
1
  ---
2
  base_model:
3
+ - amd/Zebra-Llama-3B-14MLA-14Mamba-SFT
4
  datasets:
5
+ - HuggingFaceH4/ultrafeedback_binarized
6
+ - HuggingFaceH4/orca_dpo_pairs
7
+ - JunxiongWang/llama3-ultrafeedback-armorm
8
  model-index:
9
  - name: Zebra-Llama-3B-14MLA-14Mamba-DPO
10
  results: []
 
45
  | 5. SFT | End-to-End Knowledge Distillation | The composed hybrid model is fine-tuned via knowledge distillation, using an 8B model as a teacher to transfer rich, pre-trained knowledge. |
46
  | 6. Alignment | Direct Preference Optimization (DPO) | In the final stage, DPO is used to align the model's preferences, with the distilled student model itself serving as the reference model for stability. |
47
 
48
+
49
+ ## Training Data
50
+
51
+ |Stage | Dataset | License |
52
+ |-----------|---------------------------------------------------------------------------|------------------------|
53
+ | ILD/SFT | https://huggingface.co/datasets/teknium/OpenHermes-2.5 | Refer source materials |
54
+ | ILD/SFT | https://huggingface.co/datasets/tomg-group-umd/GenQA | CC BY-NC 4.0 |
55
+ | ILD/SFT | https://huggingface.co/datasets/BAAI/Infinity-Instruct | CC BY-SA 4.0 |
56
+ | DPO | https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized | MIT |
57
+ | DPO | https://huggingface.co/datasets/HuggingFaceH4/orca_dpo_pairs | MIT |
58
+ | DPO | https://huggingface.co/datasets/JunxiongWang/llama3-ultrafeedback-armorm | MIT |
59
+
60
+
61
  ## Getting Started
62
 
63
  ### Installation