ikala-ray commited on
Commit
395eaee
1 Parent(s): f1891ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +163 -0
README.md CHANGED
@@ -1,3 +1,166 @@
1
  ---
2
  license: cc-by-nc-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ - ja
7
+ tags:
8
+ - sft
9
+ pipeline_tag: text-generation
10
+ widget:
11
+ - text: >-
12
+ <|prompter|>What is a meme, and what's the history behind this
13
+ word?<|endoftext|><|assistant|>
14
+ - text: <|prompter|>What's the Earth total population<|endoftext|><|assistant|>
15
+ - text: >-
16
+ <|prompter|>Write a story about future of AI
17
+ development<|endoftext|><|assistant|>
18
  ---
19
+
20
+ # Redpajama-3B SFT model
21
+
22
+ It is based on a RedPajama's 3B that was fine-tuned on human demonstrations
23
+ of assistant conversations collected through the
24
+ [https://open-assistant.io/](https://open-assistant.io/) human feedback web
25
+ app before April 12, 2023.
26
+
27
+ ## Model Details
28
+
29
+ - **Developed by:** [Open-Assistant Contributors](https://open-assistant.io/) and [iKala](https://ikala.ai/)
30
+ - **Model type:** Transformer-based Language Model
31
+ - **Language:** English, Chinese, Japanese
32
+ - **Finetuned from:** [togethercomputer/RedPajama-INCITE-Base-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1)
33
+ - **Code:** [Open-Assistant/model/model_training](https://github.com/LAION-AI/Open-Assistant/tree/main/model/model_training)
34
+
35
+ ## Prompting
36
+
37
+ Two special tokens are used to mark the beginning of user and assistant turns:
38
+ `<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
39
+
40
+ Input prompt example:
41
+ ```
42
+ <|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>
43
+ ```
44
+ The input ends with the `<|assistant|>` token to signal that the model should
45
+ start generating the assistant reply.
46
+
47
+ ## Benchmark
48
+
49
+
50
+ | model | MMLU | BBH | Humaneval @10 |
51
+ |---|---|---|---|
52
+ | ikala/redpajama-3b-chat | 24.6 | 29.3 | 4.76 |
53
+ | ikala/bloom-zh-chat-3b | 31.4 | 30.18 | 0.0 |
54
+ | llama-7b (reference) | 30.9 | 27.6 | 10.3 |
55
+
56
+ ## Dev Details
57
+
58
+ - base model: [togethercomputer/RedPajama-INCITE-Base-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1)
59
+ - checkpoint: 1 epoch (6000 steps)
60
+
61
+ command: `deepspeed trainer_sft.py --configs defaults stablelm-7b oasst-mix --cache_dir /home/ubuntu/data_cache --output_dir .saved/stable-lm-7b-1 --num_train_epochs 4 --deepspeed`
62
+
63
+ data:
64
+ ```
65
+ datasets:
66
+ - wmt2019_zh-en:
67
+ max_val_set: 1000
68
+ max_train_set: 20000
69
+ - ted_trans_en-ja:
70
+ max_val_set: 1000
71
+ max_train_set: 20000
72
+ - ted_trans_zh-ja:
73
+ max_val_set: 1000
74
+ max_train_set: 20000
75
+ - ikala:
76
+ input_file_path: export_conversation_v4.4.jsonl
77
+ val_split: 0.05
78
+ - dolly15k:
79
+ val_split: 0.05
80
+ - oasst_export:
81
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk,zh,ja,th,ko"
82
+ input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
83
+ val_split: 0.05
84
+ - joke
85
+ - gsm8k
86
+ - webgpt
87
+ ```
88
+
89
+ with internal datasets `ikala` so if you try to reproduce please remove the dataset
90
+
91
+ redpajama-3b:
92
+ ```
93
+ redpajama-3b:
94
+ dtype: fp16
95
+ log_dir: "redpajama_3b"
96
+ learning_rate: 1e-5
97
+ model_name: saved_models/RedPajama-INCITE-Base-3B-v1
98
+ output_dir: ikala_v4_3b
99
+ weight_decay: 0.0
100
+ max_length: 8196
101
+ warmup_steps: 2000
102
+ gradient_checkpointing: true
103
+ gradient_accumulation_steps: 32
104
+ per_device_train_batch_size: 1
105
+ per_device_eval_batch_size: 2
106
+ eval_steps: 500
107
+ save_steps: 1000
108
+ num_train_epochs: 8
109
+ save_total_limit: 2
110
+ deepspeed_config: configs/zero3_config_sft.json
111
+ ```
112
+
113
+ zero config:
114
+ ```
115
+ {
116
+ "fp16": {
117
+ "enabled": "auto",
118
+ "loss_scale": 0,
119
+ "loss_scale_window": 1000,
120
+ "initial_scale_power": 16,
121
+ "hysteresis": 2,
122
+ "min_loss_scale": 1
123
+ },
124
+ "bf16": {
125
+ "enabled": "auto"
126
+ },
127
+ "optimizer": {
128
+ "type": "AdamW",
129
+ "params": {
130
+ "lr": "auto",
131
+ "betas": "auto",
132
+ "eps": "auto",
133
+ "weight_decay": "auto"
134
+ }
135
+ },
136
+ "scheduler": {
137
+ "type": "WarmupDecayLR",
138
+ "params": {
139
+ "warmup_min_lr": "auto",
140
+ "warmup_max_lr": "auto",
141
+ "warmup_num_steps": "auto",
142
+ "warmup_type": "linear",
143
+ "total_num_steps": "auto"
144
+ }
145
+ },
146
+ "zero_optimization": {
147
+ "stage": 3,
148
+ "overlap_comm": true,
149
+ "contiguous_gradients": true,
150
+ "sub_group_size": 1e9,
151
+ "reduce_bucket_size": "auto",
152
+ "stage3_prefetch_bucket_size": "auto",
153
+ "stage3_param_persistence_threshold": "auto",
154
+ "stage3_max_live_parameters": 1e9,
155
+ "stage3_max_reuse_distance": 1e9,
156
+ "stage3_gather_16bit_weights_on_model_save": true
157
+ },
158
+ "gradient_accumulation_steps": "auto",
159
+ "gradient_clipping": "auto",
160
+ "steps_per_print": 2000,
161
+ "train_batch_size": "auto",
162
+ "train_micro_batch_size_per_gpu": "auto",
163
+ "wall_clock_breakdown": false
164
+ }
165
+
166
+ ```