File size: 4,943 Bytes
15d26ec
 
395eaee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d11fc53
 
 
 
 
 
15d26ec
395eaee
 
 
f8eac2d
 
395eaee
 
 
 
 
69033c9
3443ff0
395eaee
 
c828de4
395eaee
 
 
 
7f33463
395eaee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49dcd6a
 
395eaee
 
49dcd6a
395eaee
 
 
 
b64a560
 
395eaee
23a1d76
395eaee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
license: cc-by-nc-2.0
language:
- en
- zh
- ja
tags:
- sft
pipeline_tag: text-generation
widget:
- text: >-
    <|prompter|>What is a meme, and what's the history behind this
    word?<|endoftext|><|assistant|>
- text: <|prompter|>What's the Earth total population<|endoftext|><|assistant|>
- text: >-
    <|prompter|>Write a story about future of AI
    development<|endoftext|><|assistant|>
datasets:
- OpenAssistant/oasst1
- databricks/databricks-dolly-15k
- anon8231489123/ShareGPT_Vicuna_unfiltered
- LIUM/tedlium
- theblackcat102/joke_explaination
---

# Redpajama-3B SFT model

![](https://huggingface.co/ikala/redpajama-3b-chat/resolve/main/redpajama-example.png)

It is based on a RedPajama's 3B that was fine-tuned on human demonstrations 
of assistant conversations collected through the 
[https://open-assistant.io/](https://open-assistant.io/) human feedback web 
app before April 12, 2023. 

supervised finetune on sequence length of 5120

## Model Details

- **Developed by:** [Open-Assistant Contributors](https://open-assistant.io/team) and [iKala](https://ikala.ai/)
- **Model type:** Transformer-based Language Model
- **Language:** English, Chinese, Japanese
- **Finetuned from:** [togethercomputer/RedPajama-INCITE-Base-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1)
- **Code:** [Open-Assistant/model/model_training](https://github.com/LAION-AI/Open-Assistant/tree/main/model/model_training)
- **License:** Non commercial

## Prompting

Two special tokens are used to mark the beginning of user and assistant turns:
`<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.

Input prompt example:
```
<|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>
```
The input ends with the `<|assistant|>` token to signal that the model should 
start generating the assistant reply.

## Benchmark

| model  | MMLU  | BBH  | Humaneval @10  |
|---|---|---|---|
| [ikala/redpajama-3b-chat](https://huggingface.co/ikala/redpajama-3b-chat)  |  24.6 | 29.3  |  4.8 |
| [ikala/bloom-zh-3b-chat](https://huggingface.co/ikala/bloom-zh-3b-chat)  | 31.4  | 30.2  | 0.0  |
| llama-7b (reference)  | 30.9  |  27.6 |  10.3 |


## Dev Details

- base model: [togethercomputer/RedPajama-INCITE-Base-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1)
- checkpoint: 1 epoch (6000 steps)
- hardware: NVIDIA RTX A6000 x 4


command: `deepspeed trainer_sft.py --configs defaults redpajama-3b datasets --num_train_epochs 2 --deepspeed`

data:
```
datasets:
  - wmt2019_zh-en:
      max_val_set: 1000
      max_train_set: 20000
  - ted_trans_en-ja:
      max_val_set: 1000
      max_train_set: 20000
  - ted_trans_zh-ja:
      max_val_set: 1000
      max_train_set: 20000
  - ikala:
      input_file_path: export_conversation_v4.4.jsonl
      val_split: 0.05
  - dolly15k:
      val_split: 0.05
  - oasst_export:
      lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk,zh,ja,th,ko"
      input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
      val_split: 0.05
  - joke
  - gsm8k
  - webgpt
```

with internal datasets `ikala` so if you try to reproduce please remove the dataset

redpajama-3b:
```
redpajama-3b:
  dtype: fp16
  log_dir: "redpajama_3b"
  learning_rate: 1e-5
  model_name: saved_models/RedPajama-INCITE-Base-3B-v1
  output_dir: ikala_v4_3b
  weight_decay: 0.0
  max_length: 8196
  warmup_steps: 2000
  gradient_checkpointing: true
  gradient_accumulation_steps: 32
  per_device_train_batch_size: 1
  per_device_eval_batch_size: 2
  eval_steps: 500
  save_steps: 1000
  num_train_epochs: 8
  save_total_limit: 2
  deepspeed_config: configs/zero3_config_sft.json
```

zero config:
```
{
  "fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "initial_scale_power": 16,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "bf16": {
    "enabled": "auto"
  },
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": "auto",
      "betas": "auto",
      "eps": "auto",
      "weight_decay": "auto"
    }
  },
  "scheduler": {
    "type": "WarmupDecayLR",
    "params": {
      "warmup_min_lr": "auto",
      "warmup_max_lr": "auto",
      "warmup_num_steps": "auto",
      "warmup_type": "linear",
      "total_num_steps": "auto"
    }
  },
  "zero_optimization": {
    "stage": 3,
    "overlap_comm": true,
    "contiguous_gradients": true,
    "sub_group_size": 1e9,
    "reduce_bucket_size": "auto",
    "stage3_prefetch_bucket_size": "auto",
    "stage3_param_persistence_threshold": "auto",
    "stage3_max_live_parameters": 1e9,
    "stage3_max_reuse_distance": 1e9,
    "stage3_gather_16bit_weights_on_model_save": true
  },
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "steps_per_print": 2000,
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "wall_clock_breakdown": false
}

```