cerebras
/

btlm-3b-8k-chat

Text Generation

Model card Files Files and versions Community

YX-Cerebras commited on Dec 8, 2023

Commit

f324be0

•

1 Parent(s): 9c49448

Update README.md

Files changed (1) hide show

README.md +8 -3

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ BTLM-3B-8k-chat is a chat version of the [BTLM-3B-8K](cerebras/btlm-3b-8k-base)
 BTLM-3B-8k-chat:
 - **Licensed for commercial use** (Apache 2.0).
-- **+2.26% improvement on BTLM Eleuther Harness tasks over BTLM base model**.
 - **Improved chat capabilities**.
 - **Reduced harmlessness and increased helpfulness**.
@@ -91,7 +91,10 @@ print(generated_text['generated_text'])
 ### Performance vs BTLM-3B-8k model
 ![figure_1_image](./BTLMvsBTLM-Chat.png)
-Figure 1. Performance comparison with base model across 12 tasks.
 ## Training Details
@@ -100,8 +103,8 @@ Figure 1. Performance comparison with base model across 12 tasks.
 - Learning rate: 5e-5
 - Batch size: 64
 - 1 Epoch
-- Lora r: 128
 - Dropout: 0
 - Lora alpha: 16
 - Beta: 0.05
 - Learn more: [BTLM-3B-8k-chat blog](blogpage)
@@ -181,3 +184,5 @@ Through sheer grit and unwavering commitment they persevere towards achieving th
 - **Human life:** The outputs from this model may or may not align with human values. The risk needs to be thoroughly investigated before deploying this model in a production environment where it can directly impact human life.
 - **Risks and harms:** There may be distributional bias in the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) that can manifest in various forms in the downstream model deployment. There are other risks associated with large language models such as amplifying stereotypes, memorizing training data, or revealing private or secure information.

 BTLM-3B-8k-chat:
 - **Licensed for commercial use** (Apache 2.0).
+- **+2.26% improvement on 10 downstream tasks and MMLU over BTLM base model**.
 - **Improved chat capabilities**.
 - **Reduced harmlessness and increased helpfulness**.
 ### Performance vs BTLM-3B-8k model
 ![figure_1_image](./BTLMvsBTLM-Chat.png)
+Figure 1. Performance comparison with base model across 11 tasks.
+![table_1_image](./BTLMvsBTLM-Chat_detail.png)
+Table 1: Detailed down-stream tasks comparisons. MMLU task performance is reported using 5-shot, other tasks are 0-shot.
 ## Training Details
 - Learning rate: 5e-5
 - Batch size: 64
 - 1 Epoch
 - Dropout: 0
+- Lora r: 128
 - Lora alpha: 16
 - Beta: 0.05
 - Learn more: [BTLM-3B-8k-chat blog](blogpage)
 - **Human life:** The outputs from this model may or may not align with human values. The risk needs to be thoroughly investigated before deploying this model in a production environment where it can directly impact human life.
 - **Risks and harms:** There may be distributional bias in the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) that can manifest in various forms in the downstream model deployment. There are other risks associated with large language models such as amplifying stereotypes, memorizing training data, or revealing private or secure information.
+## Acknowledgements
+We are thankful to all Cerebras engineers that made this work possible.