Text Generation
Transformers
PyTorch
English
btlm
causal-lm
Cerebras
BTLM
custom_code
YX-Cerebras commited on
Commit
f324be0
1 Parent(s): 9c49448

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -3
README.md CHANGED
@@ -24,7 +24,7 @@ BTLM-3B-8k-chat is a chat version of the [BTLM-3B-8K](cerebras/btlm-3b-8k-base)
24
 
25
  BTLM-3B-8k-chat:
26
  - **Licensed for commercial use** (Apache 2.0).
27
- - **+2.26% improvement on BTLM Eleuther Harness tasks over BTLM base model**.
28
  - **Improved chat capabilities**.
29
  - **Reduced harmlessness and increased helpfulness**.
30
 
@@ -91,7 +91,10 @@ print(generated_text['generated_text'])
91
 
92
  ### Performance vs BTLM-3B-8k model
93
  ![figure_1_image](./BTLMvsBTLM-Chat.png)
94
- Figure 1. Performance comparison with base model across 12 tasks.
 
 
 
95
 
96
 
97
  ## Training Details
@@ -100,8 +103,8 @@ Figure 1. Performance comparison with base model across 12 tasks.
100
  - Learning rate: 5e-5
101
  - Batch size: 64
102
  - 1 Epoch
103
- - Lora r: 128
104
  - Dropout: 0
 
105
  - Lora alpha: 16
106
  - Beta: 0.05
107
  - Learn more: [BTLM-3B-8k-chat blog](blogpage)
@@ -181,3 +184,5 @@ Through sheer grit and unwavering commitment they persevere towards achieving th
181
  - **Human life:** The outputs from this model may or may not align with human values. The risk needs to be thoroughly investigated before deploying this model in a production environment where it can directly impact human life.
182
  - **Risks and harms:** There may be distributional bias in the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) that can manifest in various forms in the downstream model deployment. There are other risks associated with large language models such as amplifying stereotypes, memorizing training data, or revealing private or secure information.
183
 
 
 
 
24
 
25
  BTLM-3B-8k-chat:
26
  - **Licensed for commercial use** (Apache 2.0).
27
+ - **+2.26% improvement on 10 downstream tasks and MMLU over BTLM base model**.
28
  - **Improved chat capabilities**.
29
  - **Reduced harmlessness and increased helpfulness**.
30
 
 
91
 
92
  ### Performance vs BTLM-3B-8k model
93
  ![figure_1_image](./BTLMvsBTLM-Chat.png)
94
+ Figure 1. Performance comparison with base model across 11 tasks.
95
+
96
+ ![table_1_image](./BTLMvsBTLM-Chat_detail.png)
97
+ Table 1: Detailed down-stream tasks comparisons. MMLU task performance is reported using 5-shot, other tasks are 0-shot.
98
 
99
 
100
  ## Training Details
 
103
  - Learning rate: 5e-5
104
  - Batch size: 64
105
  - 1 Epoch
 
106
  - Dropout: 0
107
+ - Lora r: 128
108
  - Lora alpha: 16
109
  - Beta: 0.05
110
  - Learn more: [BTLM-3B-8k-chat blog](blogpage)
 
184
  - **Human life:** The outputs from this model may or may not align with human values. The risk needs to be thoroughly investigated before deploying this model in a production environment where it can directly impact human life.
185
  - **Risks and harms:** There may be distributional bias in the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) that can manifest in various forms in the downstream model deployment. There are other risks associated with large language models such as amplifying stereotypes, memorizing training data, or revealing private or secure information.
186
 
187
+ ## Acknowledgements
188
+ We are thankful to all Cerebras engineers that made this work possible.