YX-Cerebras
commited on
Commit
•
f324be0
1
Parent(s):
9c49448
Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ BTLM-3B-8k-chat is a chat version of the [BTLM-3B-8K](cerebras/btlm-3b-8k-base)
|
|
24 |
|
25 |
BTLM-3B-8k-chat:
|
26 |
- **Licensed for commercial use** (Apache 2.0).
|
27 |
-
- **+2.26% improvement on
|
28 |
- **Improved chat capabilities**.
|
29 |
- **Reduced harmlessness and increased helpfulness**.
|
30 |
|
@@ -91,7 +91,10 @@ print(generated_text['generated_text'])
|
|
91 |
|
92 |
### Performance vs BTLM-3B-8k model
|
93 |
![figure_1_image](./BTLMvsBTLM-Chat.png)
|
94 |
-
Figure 1. Performance comparison with base model across
|
|
|
|
|
|
|
95 |
|
96 |
|
97 |
## Training Details
|
@@ -100,8 +103,8 @@ Figure 1. Performance comparison with base model across 12 tasks.
|
|
100 |
- Learning rate: 5e-5
|
101 |
- Batch size: 64
|
102 |
- 1 Epoch
|
103 |
-
- Lora r: 128
|
104 |
- Dropout: 0
|
|
|
105 |
- Lora alpha: 16
|
106 |
- Beta: 0.05
|
107 |
- Learn more: [BTLM-3B-8k-chat blog](blogpage)
|
@@ -181,3 +184,5 @@ Through sheer grit and unwavering commitment they persevere towards achieving th
|
|
181 |
- **Human life:** The outputs from this model may or may not align with human values. The risk needs to be thoroughly investigated before deploying this model in a production environment where it can directly impact human life.
|
182 |
- **Risks and harms:** There may be distributional bias in the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) that can manifest in various forms in the downstream model deployment. There are other risks associated with large language models such as amplifying stereotypes, memorizing training data, or revealing private or secure information.
|
183 |
|
|
|
|
|
|
24 |
|
25 |
BTLM-3B-8k-chat:
|
26 |
- **Licensed for commercial use** (Apache 2.0).
|
27 |
+
- **+2.26% improvement on 10 downstream tasks and MMLU over BTLM base model**.
|
28 |
- **Improved chat capabilities**.
|
29 |
- **Reduced harmlessness and increased helpfulness**.
|
30 |
|
|
|
91 |
|
92 |
### Performance vs BTLM-3B-8k model
|
93 |
![figure_1_image](./BTLMvsBTLM-Chat.png)
|
94 |
+
Figure 1. Performance comparison with base model across 11 tasks.
|
95 |
+
|
96 |
+
![table_1_image](./BTLMvsBTLM-Chat_detail.png)
|
97 |
+
Table 1: Detailed down-stream tasks comparisons. MMLU task performance is reported using 5-shot, other tasks are 0-shot.
|
98 |
|
99 |
|
100 |
## Training Details
|
|
|
103 |
- Learning rate: 5e-5
|
104 |
- Batch size: 64
|
105 |
- 1 Epoch
|
|
|
106 |
- Dropout: 0
|
107 |
+
- Lora r: 128
|
108 |
- Lora alpha: 16
|
109 |
- Beta: 0.05
|
110 |
- Learn more: [BTLM-3B-8k-chat blog](blogpage)
|
|
|
184 |
- **Human life:** The outputs from this model may or may not align with human values. The risk needs to be thoroughly investigated before deploying this model in a production environment where it can directly impact human life.
|
185 |
- **Risks and harms:** There may be distributional bias in the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) that can manifest in various forms in the downstream model deployment. There are other risks associated with large language models such as amplifying stereotypes, memorizing training data, or revealing private or secure information.
|
186 |
|
187 |
+
## Acknowledgements
|
188 |
+
We are thankful to all Cerebras engineers that made this work possible.
|