File size: 27,482 Bytes
7af9c8f
 
 
 
 
548e80e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
308df36
548e80e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25fba16
548e80e
 
 
 
 
 
 
 
 
 
 
25fba16
548e80e
 
 
 
25fba16
548e80e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
---
license: other
license_name: bloomchat-176b-license-v1.0
license_link: LICENSE
---

# BLOOMChat V2.0

<!-- Provide a quick summary of what the model is/does. -->

BLOOMChat-v2 is a 176 billion parameter multilingual chat model. It is finetuned from [BLOOM (176B)](https://huggingface.co/bigscience/bloom) on long-sequence multilingual data and assistant-style conversation datasets. It supports conversation, question answering and generative answers in multiple languages.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** [SambaNova Systems](https://sambanova.ai/)
- **Model type:** Language Model
- **Language(s):** Multiple; see [training data from BLOOM](https://huggingface.co/bigscience/bloom#training-data)
- **License:** BLOOMChat-176B LICENSE v1.0
- **Finetuned from model:** [BigScience Group BLOOM](https://huggingface.co/bigscience/bloom)

### Basic Information

<!-- Provide the basic links for the model. -->
- **Blog Post**: [Link](https://sambanova.ai/blog/bloomchat-v2)
- **Discord**: [Link](https://discord.com/invite/8z2Pe7cpRv)
- **Github**: [Link](https://github.com/sambanova/bloomchat)

### Licensing

To increase accessibility and to support the open-source community, SambaNova is releasing BLOOMChat under a modified version of the Apache 2.0 license, which includes use-based restrictions from BLOOM’s RAIL license. While use-based restrictions are necessarily passed through, there are no blanket restrictions on reuse, distribution, commercialization or adaptation. [Please review SambaNova’s BLOOMChat-176B License](LICENSE)

## Uses
<details>
<summary>Click to expand</summary>
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
This model is intended for commercial and research use.


### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->


BLOOMChat should NOT be used for:

- Mission-critical applications
- Applications that involve the safety of others
- Making highly important decisions
- Important automated pipelines

This model is still in early development and can be prone to mistakes and hallucinations, there is still room for improvement. This model is intended to provide the community with a multilingual chat LLM baseline.

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users should be made aware of the risks, biases, limitations, and restrictions of the model, which are listed down at the bottom of the page. 

</details>


---
## How to Get Started with the Model

<details>
<summary>Click to expand</summary>

### Loading in model with Huggingface

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/BLOOMChat-176B-v2")
model = AutoModelForCausalLM.from_pretrained("sambanovasystems/BLOOMChat-176B-v2", device_map="auto", torch_dtype="auto")
```

### Quick Start Inference on SambaNova's in-house Reconfigurable Dataflow Unit (RDU)

The inference code to run the model can be found our [github repo](https://github.com/sambanova/bloomchat/blob/main/rdu_quick_start/inference.py). This code requires the [SambaFlow](https://docs.sambanova.ai/developer/latest/sambaflow-intro.html) SDK to execute. For those interested in running models on RDUs, [please feel free to get in touch](https://sambanova.ai/getstarted).

### Quick Start Inference on GPU

First create a python virtual environment for these packages

```
python3 -m venv bloomchat_venv
source bloomchat_venv/bin/activate
pip install --upgrade pip
```

<!-- Please follow this section [Inference solutions for BLOOM 176B](https://github.com/huggingface/transformers-bloom-inference#bloom-inference-via-command-line) in the Huggingface Tutorial for environment set up and stop before the [BLOOM inference via command-line
](https://github.com/huggingface/transformers-bloom-inference#bloom-inference-via-command-line) section. -->

```
pip install flask flask_api gunicorn pydantic accelerate huggingface_hub>=0.9.0 deepspeed>=0.7.3 deepspeed-mii==0.0.2
```
And then
```
pip install transformers==4.27.0
```

You will see messages like this 
```
ERROR: deepspeed-mii 0.0.2 has requirement transformers==4.21.2, but you'll have transformers 4.27.0 which is incompatible.
Installing collected packages: transformers
  Found existing installation: transformers 4.21.2
    Uninstalling transformers-4.21.2:
      Successfully uninstalled transformers-4.21.2
Successfully installed transformers-4.27.0
```

Now let's git clone the [huggingface/transformers-bloom-inference](https://github.com/huggingface/transformers-bloom-inference) repo.
```
git clone https://github.com/huggingface/transformers-bloom-inference.git
cd transformers-bloom-inference/
```
And then you need to modify two files in this [transformers-bloom-inference](https://github.com/huggingface/transformers-bloom-inference) repo:

- Modifying `inference_server/models/hf_accelerate.py`
    - This is because for our testing of this repo we used 4 80GB A100 GPUs and would run into memory issues
- Modifying `inference_server/cli.py`
    - This is because the model was trained using specific human, bot tags
    - Trailing spaces may lead to subpar performance

Modifications for `inference_server/models/hf_accelerate.py`:

```diff
diff --git a/inference_server/models/hf_accelerate.py b/inference_server/models/hf_accelerate.py
index 9be3c3f..a8ecb1d 100644
--- a/inference_server/models/hf_accelerate.py
+++ b/inference_server/models/hf_accelerate.py
@@ -1,4 +1,5 @@
 from argparse import Namespace
+from accelerate.utils.modeling import get_max_memory
 
 import torch
 
@@ -12,6 +13,12 @@ class HFAccelerateModel(Model):
 
         kwargs = {"pretrained_model_name_or_path": args.model_name, "device_map": "auto"}
 
+        original_max_memory_dict = get_max_memory()
+
+        reduce_max_memory_dict = {device_key: int(original_max_memory_dict[device_key] * 0.85) for device_key in original_max_memory_dict}
+
+        kwargs["max_memory"] = reduce_max_memory_dict
+
         if get_world_size() > 1:
             kwargs["device_map"] = "balanced_low_0"

```

Modifications for `inference_server/cli.py`:

```diff
diff --git a/inference_server/cli.py b/inference_server/cli.py
index fc903d5..5450236 100644
--- a/inference_server/cli.py
+++ b/inference_server/cli.py
@@ -22,6 +22,9 @@ def main() -> None:
     while True:
         input_text = input("Input text: ")
 
+        input_text = input_text.strip()
+        modified_input_text = f"<human>: {input_text}\n<bot>:"
+
         if input("change generate_kwargs? [y/n] ") == "y":
             while True:
                 try:
@@ -33,7 +36,7 @@ def main() -> None:
                     print("message =", e_message)
                     continue
 
-        response = model.generate(text=[input_text], generate_kwargs=generate_kwargs)
+        response = model.generate(text=[modified_input_text], generate_kwargs=generate_kwargs)
 
         print_rank_0("Output text:", response.text[0])
         print_rank_0("Generated tokens:", response.num_generated_tokens[0])

```
And now you are good to go!

Running command for bf16, NO sampling
```
python -m inference_server.cli --model_name sambanovasystems/BLOOMChat-176B-v2 --model_class AutoModelForCausalLM --dtype bf16 --deployment_framework hf_accelerate --generate_kwargs '{"do_sample": false, "max_new_tokens": 512}'
```
Running command for bf16, YES sampling
```
python -m inference_server.cli --model_name sambanovasystems/BLOOMChat-176B-v2 --model_class AutoModelForCausalLM --dtype bf16 --deployment_framework hf_accelerate --generate_kwargs '{"do_sample": true, "temperature": 0.8, "repetition_penalty": 1.2, "top_p": 0.9, "max_new_tokens": 512}'
```
---
Running command for int8 (sub optimal performance, but fast inference time) NO sampling:
```
python -m inference_server.cli --model_name sambanovasystems/BLOOMChat-176B-v2 --model_class AutoModelForCausalLM --dtype int8 --deployment_framework hf_accelerate --generate_kwargs '{"do_sample": false, "max_new_tokens": 512}'
```
Running command for int8 (sub optimal performance, but fast inference time) YES sampling:
```
python -m inference_server.cli --model_name sambanovasystems/BLOOMChat-176B-v2 --model_class AutoModelForCausalLM --dtype int8 --deployment_framework hf_accelerate --generate_kwargs '{"do_sample": true, "temperature": 0.8, "repetition_penalty": 1.2, "top_p": 0.9, "max_new_tokens": 512}'
```

**DISCLAIMER:** When using int8, the results will be subpar compared to bf16 as the model is being [quantized](https://huggingface.co/blog/hf-bitsandbytes-integration#introduction-to-model-quantization).

### Suggested Inference Parameters
- Temperature: 0.8
- Repetition penalty: 1.2
- Top-p: 0.9
- Max generated tokens: 512

### Suggested Prompts To Try in GPU Tutorial
```
Input text: Write a script in which Bob accidentally breaks his dad's guitar
```

```
Input text: Create an itemized list of tasks to complete to start a clothing brand
```

```
Input text: 十七岁的风是什么颜色的?
```


</details>

---

## Some example completions for English

<details>
<summary>Click to expand</summary>

```
<human>: What does it mean for the Fed to raise or lower interest rates?
<bot>: Lowering interest rates means that banks will be able to borrow from the Federal Reserve at a cheaper rate. This makes it more attractive for banks to borrow money and increase their lending activities, as lowering interest rates makes it more attractive for people to borrow money. This increases the money supply and is typically seen as a stimulus for the economy. Conversely, raising interest rates makes it more expensive for banks to borrow money, which reduces their incentive to lend. This reduces the money supply and is typically seen as a way to slow down the economy and reduce inflation.
```

```
<human>: Write an engaging and well-written property listing description for selling a house. Address of the house and some of the details are given to you. Fill in the information gap with hallucinations if needed.

Property Address: 412 Monterey Ave, Capitola, CA 95010
Parking: 3 cars + a finished single car garage
Details:
- open floorplan
- window with views to park/mountains
- Kitchen with shaker cabinetry and white Calcatta quartz counters
<bot>: Welcome to 412 Monterey Ave, Capitola, CA 95010! This beautiful home offers an open floorplan, shaker cabinetry, and a finished single car garage. With its modern design, this house is a sight to behold. The window in the living room provides a view of the park and the mountains, bringing the outdoors in. The Kitchen features white Calcatta quartz counters and is a chef's dream. It is perfect for entertaining with its open layout and plenty of space for dining. Don't miss out on this opportunity to call this house your home!
```

```
<human>: What is artificial intelligence? Explain it so an 8 year old will understand.
<bot>: That's a great question. Artificial Intelligence (AI) is a computer program or machine that can do things that would normally require human intelligence to do. AI can do things like understand natural language, learn from experience, recognize objects, and make decisions.

```

</details>

---

## Some example completions for Multilingual
<details>
<summary>Click to expand</summary>

#### Arabic

```
<human>: هل العمل نصف دوام فكرة جيدة لمراهقين بين عمر ١٥ و ١٩ عام
<bot>: نعم ، يمكن أن يكون العمل نصف دوام فكرة جيدة للمراهقين بين عمري 15 و 19 عامًا ، حيث يمكنهم من خلالها اكتساب خبرة العمل الأولى وتنمية مهارات العمل المهمة. بالإضافة إلى ذلك، يمكن أن يكون العمل نصف دوام فرصة جيدة للمراهقين لتعلم كيفية إدارة الوقت بشكل جيد.
ومع ذلك، من المهم أن تراعي الوظائف التي يتم شغلها خلال نصف الدوام احتياجات المراهقين وتجنب أي ضرر ممكن. بالإضافة إلى ذلك، من المهم أن يكون لدى المراهقين وقت فراغ كافٍ للاسترخاء والتعرف على أقرانهم وتنمية مهاراتهم الأخرى.
بالإضافة إلى ذلك، من المهم أن يدعم الآباء عمل المراهقين، ويشجعونهم على الاستمرار في الدراسة بنجاح، ويقدمون الدعم العاطفي لهم.
```

**DISCLAIMER**: Text ordering may be off for Arabic due to code block not being able to render
#### Chinese

```
<human>: 请向我推荐一部周末看的电影并说明原因。
<bot>: 我推荐的电影是疯狂动物城(Zootopia)。这部电影讲述了一个发生在动物城市里的故事。主人公是一个名叫Judy的兔子警官,她也是一个初来乍到的新人。因为初来乍到,所以她受到了其他动物同事的排挤。不过她并没有气馁,而是用自己的行动证明了自己。在解决 一个一个案子,打击罪犯的同时,她也在这个过程中收获了友情和爱情。这部电影不仅有深刻的教育意义,而且还有好看的动作和搞笑的场面。所以我觉得这是一部很适合周末看的电影。

```

#### French

```
<human>: Il y a t’il un petit-déjeuner de 700-1000 calories sans oeuf mais qui inclut des protéines.
<bot>: 
```

#### Hindi
```
<human>: दिल्ली से जयपुर कैसे जाये? बस से या कार से? दोनो के बारे मे विस्तार मे बताये.
<bot>: आप दिल्ली से जयपुर कार से या बस से जा सकते है, दोनों ही तरीके आसान है।
अगर आप कार से यात्रा करते है तो यह यात्रा 4 घंटे में पूरी हो जाएगी। आपको दिल्ली से जयपुर के लिए कुछ पॉपुलर मार्ग इस तरह से है -
1. नेशनल हाई-वे 2 (NH 2) - यह दिल्ली से जयपुर जाने का सबसे पॉपुलर मार्ग है और यह पूरी तरह से एक फोर-लेन मार्ग है। इस मार्ग पर आप बिना किसी रोक-टोक के दिल्ली से जयपुर तक पहुंच सकते है।
2. दिल्ली-जयपुर मार्ग (Delhi-Jaipur Highway) - यह दिल्ली से जयपुर जाने का एक और आसान मार्ग है। यह जयपुर से दिल्ली की और जाने वाले NH 2 को जयपुर में दिल्ली-जयपुर राष्ट्रीय राजमार्ग (Delhi-Jaipur National Highway) कहते है।
3. दिल्ली-जयपुर मार्ग (Delhi-Jaipur Highway) - यह NH 2 दिल्ली से जयपुर तक जाने का एक और मार्ग है। यह दिल्ली से जयपुर की और जाने वाले NH 2 से जुड़ा हुआ है और दिल्ली से जयपुर तक सीधा जाता है।
4. दिल्ली-जयपुर मार्ग (Delhi-Jaipur Highway) - यह NH 2 दिल्ली से जयपुर तक जाने का एक और मार्ग है। यह दिल्ली से जयपुर की और जाने वाले NH 2 से जुड़ा हुआ है और दिल्ली से जयपुर तक सीधा जाता है।
जयपुर जाने के लिए आप इनमें से किसी भी मार्ग का चुनाव कर सकते है और आसानी से जयपुर पहुंच सकते है।
जयपुर जाने के लिए आप सरकारी बसों का भी चुनाव कर सकते है। दिल्ली से जयपुर के लिए कई बसों की सुविधा उपलब्ध है। दिल्ली से जयपुर के लिए आप दिल्ली के सराय रोहिल्ला बस अड्डे से आसानी से बस ले सकते है। सराय रोहिल्ला से जयपुर के लिए कई बसें उपलब्ध है जो आपको जयपुर पहुंचा देगी।
जयपुर जाने के लिए आप प्राइवेट कैब का भी उपयोग कर सकते है। यह सुविधा आप ऑनलाइन भी ले सकते है। कैब से दिल्ली से जयपुर जाने में आपको लगभग 4 घंटे लगेंगे।
```

#### Spanish

```
<human>: ¿Me puedes dar algunas recomendaciones de juegos? Estoy buscando juegos de rol con un sistema de magia único, un sistema de construcción único y una buena historia. Por favor limítate a juegos que se puedan jugar en PC directamente, o por emulación.
<bot>: Claro, aquí están algunas opciones que podrían interesarte:

1.- "Spellstorm": Este juego de rol tiene un sistema de magia único que se basa en la manipulación de los elementos y la creación de tormentas de magia. El sistema de construcción es basado en habilidades y la historia es interesante, con una ambientación de fantasía oscura.

2.- "Infinity Engine": Este es un sistema de juego de rol creado por "Rol Team B", que cuenta con varios títulos de calidad. El sistema de magia es único y se basa en la manipulación de portales interdimensionales. El sistema de construcción es basado en habilidades y la historia es interesante, con una ambientación de fantasía.

3.- "Arcana Evolved": Este juego de rol tiene un sistema de magia único que se basa en la manipulación de las leyes fundamentales del universo. El sistema de construcción es basado en habilidades y atributos, y la historia es interesante, con una ambientación de ciencia ficción.

4.- "Fate": Este es un sistema de juego de rol que se enfoca en la narrativa y la toma de decisiones. El sistema de magia es inexistente, pero la historia es muy interesante, con una ambientación de fantasía y ciencia ficción.

Espero que estas opciones te sean de ayuda.

```


</details>

---

## Training Details

<details>
<summary>Click to expand</summary>

### Training Data

<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

#### Long Sequence Pretraining Data
- [mc4 3.1.0](https://huggingface.co/datasets/allenai/c4/tree/mC4_3.1.0)
- [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)
- [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata)
- [The PILE](https://huggingface.co/datasets/EleutherAI/the_pile_deduplicated)
- [Pile of Law](https://huggingface.co/datasets/pile-of-law/pile-of-law)
- 10-Ks and 10-Qs from [EDGAR](https://www.sec.gov/edgar/search-and-access)
- arXiv papers from Economics and Quantitative Finance
- YouTube transcripts from finance-related public YouTube videos

#### Finetuning Data
- [OIG dataset from OpenChatKit](https://huggingface.co/datasets/laion/OIG)
- [Dolly 2.0](https://huggingface.co/datasets/databricks/databricks-dolly-15k)
- [Oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)

### Training Procedure 

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

We trained BLOOMChat-v2 with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from [BLOOM (176B)](https://huggingface.co/bigscience/bloom), an open-source multilingual LLM pretrained by the [BigScience group](https://huggingface.co/bigscience). 
We then continued pretraining the model on an in-house mix of multilingual text (see the full breakdown above). 

We instruction-tune the resulting model on OpenChatKit with each data source subsampled to 100k for one epoch, followed by sixteen epochs over the combined OpenChatKit and Dolly 2.0. 
All of the code used to prepare the finetuning datasets and the scripts to run finetuning and inference are open-sourced and freely available at [sambanova/bloomchat](https://github.com/sambanova/bloomchat/tree/main)


### Prompting Style Used For Training
```
<human>: {input1 that the user wants from the bot}
<bot>: {response1}</s>
<human>: {input2 that the user wants from the bot}
<bot>: {response2}</s>
```

### Hyperparameters

**Long-sequence Pretraining**
- Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
- Optimizer: AdamW
- Grad accumulation: 1
- Steps: 1620
- Global Batch size: 2048
- Batch tokens: 2048 * 8192 = 16,777,216 tokens
- Learning Rate: 6e-6
- Learning Rate Scheduler: Flat
- Warmup Steps: 0
- Weight decay: 0.1

**Instruction-tuned Training on OIG**

- Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
- Optimizer: AdamW
- Grad accumulation: 1
- Epochs: 1
- Global Batch size: 128
- Batch tokens: 128 * 8192 = 1,048,576 tokens
- Learning Rate: 6e-6
- Learning Rate Scheduler: Cosine Schedule with Warmup
- Warmup Steps: 0
- End Learning Ratio: 0.1
- Weight decay: 0.1

**Instruction-tuned Training on Dolly 2.0 and Oasst1**

- Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
- Optimizer: AdamW
- Grad accumulation: 1
- Epochs: 16
- Global Batch size: 128
- Batch tokens: 128 * 8192 = 1,048,576 tokens
- Learning Rate: 6e-6
- Learning Rate Scheduler: Cosine Schedule with Warmup
- Warmup Steps: 0
- End Learning Ratio: 0.1
- Weight decay: 0.1

</details>

---

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

Like all LLMs, BLOOMChat has certain limitations:
- Hallucination: BLOOMChat may sometimes generate responses that contain plausible-sounding but factually incorrect or irrelevant information.
- Code Switching: The model might unintentionally switch between languages or dialects within a single response, affecting the coherence and understandability of the output.
- Repetition: BLOOMChat may produce repetitive phrases or sentences, leading to less engaging and informative responses.
- Coding and Math: The model's performance in generating accurate code or solving complex mathematical problems may be limited.
- Toxicity: BLOOMChat may inadvertently generate responses containing inappropriate or harmful content.

## Acknowledgments

We would like to extend our gratitude to [Together](https://www.together.xyz/) for their contributions to BLOOMChat-v1, without which BLOOMChat-v2 would not have been possible.

We are grateful to the various researchers and open-source projects that have contributed to the development of BLOOMChat-v2. We thank [BigScience](https://bigscience.huggingface.co/) for providing the [BLOOM](https://huggingface.co/bigscience/bloom) model, which served as the base for BLOOMChat-v2. 
For our long-sequence pretraining data: we thank [Common Crawl](https://commoncrawl.org/), [Google Research](https://research.google/), and [Allen Institute for AI](https://allenai.org/) for their contributions in making mc4-3.1.0 (avilable on HuggingFace [here](https://huggingface.co/datasets/allenai/c4/tree/mC4_3.1.0)) possible. We thank the [Technology Innovation Institute](https://www.tii.ae/) for [Falcon RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).
We thank [BigCode](https://huggingface.co/bigcode) for [StarCoderData](https://huggingface.co/datasets/bigcode/starcoderdata), [EleutherAI](https://www.eleuther.ai/) for [The PILE](https://huggingface.co/datasets/EleutherAI/the_pile_deduplicated), the authors of [Pile of Law](https://huggingface.co/datasets/pile-of-law/pile-of-law) for their dataset, and the authors of [sec-edgar](https://github.com/sec-edgar/sec-edgar) for their convenient Python library for downloading SEC filings.

For our finetuning data, we thank [LAION](https://laion.ai/) for their [OIG dataset](https://huggingface.co/datasets/laion/OIG), OpenAssistant Conversations Dataset ([OASST1](https://huggingface.co/datasets/OpenAssistant/oasst1)) and also thank [Databricks](https://www.databricks.com/) for providing [Dolly 2.0](https://huggingface.co/datasets/databricks/databricks-dolly-15k).

We appreciate [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [BigScience](https://bigscience.huggingface.co/) for their essential benchmarking contributions, which is very helpful in evaluating BLOOMChat's performance. We appreciate the inspiration from the wave of various recent open-source chat models, including [OpenAssistant-30B](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor), [LLaMA-Adapter-V2-65B](https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/llama_adapter_v2_chat65b), [Vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0), [Koala-13b](https://huggingface.co/TheBloke/koala-13B-HF), [OASST-Pythia-12b](https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b), [Alpaca-13b](https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g), [ChatGLM-6b](https://github.com/THUDM/ChatGLM-6B), [FastChat-T5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0), [Dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b), [LLaMA-13b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/), [StableLM-Tuned-Alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b), [RedPajama-INCITE-Chat-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-7B-v0.1), [RedPajama-INCITE-Chat-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1), [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat) and so on. We look forward to witnessing the continued growth and success of open-source chat-based models.

We highly appreciate the hard work and dedication of these researchers and organizations towards the advancement of the open-source community. Their contributions were invaluable in the development of BLOOMChat, and we hope that our model can contribute to further advancements in the field.

## Cite BLOOMChat
```
@software{bloomchat-v2,
  title = {{BLOOMChat-v2: an Open Multilingual Chat LLM for Long Sequences}},
  author = {SambaNova Systems},
  url = {https://huggingface.co/sambanovasystems/BLOOMChat-176B-v2}
  month = {2},
  year = {2024},
  version = {2.0},
}
```