jayr014 commited on
Commit
6aee32f
1 Parent(s): f63e0ab

renaming OIG dataset in one section, updating the out of scope, removed misuse and malicious use as it is stated in the restrictions, updated quick summary

Browse files
Files changed (1) hide show
  1. README.md +14 -19
README.md CHANGED
@@ -8,7 +8,7 @@ license: apache-2.0
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
 
11
- BLOOMChat is based on [BigScience Group BLOOM model](https://huggingface.co/bigscience/bloom), and is instruction-tuned on a subset of 100k datapoints per data source from the [OIG dataset](https://huggingface.co/datasets/laion/OIG) from the [OpenChatKit](https://www.together.xyz/blog/openchatkit). Then aligned using [Dolly 2.0](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and [Oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1).
12
 
13
  ## Model Details
14
 
@@ -16,7 +16,8 @@ BLOOMChat is based on [BigScience Group BLOOM model](https://huggingface.co/bigs
16
 
17
  <!-- Provide a longer summary of what this model is. -->
18
 
19
- - **Developed by:** [SambaNova Systems](https://sambanova.ai/) and [Together Computer](https://www.together.xyz/)
 
20
  - **Model type:** Language Model
21
  - **Language(s):** Multiple; see [training data from BLOOM](https://huggingface.co/bigscience/bloom#training-data)
22
  - **License:** apache-2.0 with RAIL restrictions
@@ -41,7 +42,16 @@ This model is intended for commercial and research use.
41
 
42
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
43
 
44
- BLOOMChat is intended for chatbot applications and may not perform well in use cases outside of the intended use. BLOOMChat should NOT be used for safety-critical applications or for making decisions that have a significant impact on individuals or society.
 
 
 
 
 
 
 
 
 
45
 
46
 
47
  ### Recommendations
@@ -193,7 +203,7 @@ In the nucleus of atoms, protons and neutrons are bound together in a structure
193
 
194
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
195
 
196
- - [OIG dataset](https://huggingface.co/datasets/laion/OIG)
197
  - [Dolly 2.0](https://huggingface.co/datasets/databricks/databricks-dolly-15k)
198
  - [Oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
199
 
@@ -282,19 +292,4 @@ Like all LLMs, BLOOMChat has certain limitations:
282
  - Coding and Math: The model's performance in generating accurate code or solving complex mathematical problems may be limited.
283
  - Toxicity: BLOOMChat may inadvertently generate responses containing inappropriate or harmful content.
284
 
285
- ### Misuse and Malicious Use
286
-
287
- BLOOMChat is designed for use in chatbot applications and should not be used for any other purpose. Misuse of the model, such as using it to engage in illegal or unethical activities, is strictly prohibited and goes against the principles of the BLOOMChat community project.
288
-
289
- Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
290
-
291
- - Generating fake news, misinformation, or propaganda
292
- - Promoting hate speech, discrimination, or violence against individuals or groups
293
- - Impersonating individuals or organizations without their consent
294
- - Engaging in cyberbullying or harassment
295
- - Defamatory content
296
- - Spamming or scamming
297
- - Sharing confidential or sensitive information without proper authorization
298
- - Violating the terms of use of the model or the data used to train it
299
- - Creating automated bots for malicious purposes such as spreading malware, phishing scams, or spamming
300
 
 
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
 
11
+ BLOOMChat is [BigScience Group BLOOM model](https://huggingface.co/bigscience/bloom) instruction-tuned on a subset of 100k datapoints per data source from the [OIG dataset](https://huggingface.co/datasets/laion/OIG) from the [OpenChatKit](https://www.together.xyz/blog/openchatkit). Then aligned using [Dolly 2.0](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and [Oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1).
12
 
13
  ## Model Details
14
 
 
16
 
17
  <!-- Provide a longer summary of what this model is. -->
18
 
19
+ - **Developed by:** [SambaNova Systems](https://sambanova.ai/)
20
+ - **Co-developed by:** [Together Computer](https://www.together.xyz/)
21
  - **Model type:** Language Model
22
  - **Language(s):** Multiple; see [training data from BLOOM](https://huggingface.co/bigscience/bloom#training-data)
23
  - **License:** apache-2.0 with RAIL restrictions
 
42
 
43
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
44
 
45
+
46
+ Bloom chat should NOT be used for:
47
+
48
+ - Mission-critical applications
49
+ - Applications that involve the safety of others
50
+ - Making highly important decisions
51
+ - Important automated pipelines/decisions
52
+
53
+ This model is still in early development and can be prone to mistakes and hallucinations, there is still room for improvement. This model is intended to provide the community with a good baseline.
54
+
55
 
56
 
57
  ### Recommendations
 
203
 
204
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
205
 
206
+ - [OIG dataset from OpenChatKit](https://huggingface.co/datasets/laion/OIG)
207
  - [Dolly 2.0](https://huggingface.co/datasets/databricks/databricks-dolly-15k)
208
  - [Oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
209
 
 
292
  - Coding and Math: The model's performance in generating accurate code or solving complex mathematical problems may be limited.
293
  - Toxicity: BLOOMChat may inadvertently generate responses containing inappropriate or harmful content.
294
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
295