YX-Cerebras
commited on
Commit
•
94dd770
1
Parent(s):
0b6d290
Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ license: apache-2.0
|
|
16 |
|
17 |
# BTLM-3B-8k-chat
|
18 |
|
19 |
-
BTLM-3B-8k-chat is a chat version of the [BTLM-3B-8K](cerebras/btlm-3b-8k-base) model trained using [DPO](https://arxiv.org/abs/2305.18290) method on [Anthropic-HH-RLHF](Anthropic/hh-rlhf) dataset. The model was specifically trained to align to human preferences and optimized for dialogue use cases.
|
20 |
|
21 |
|
22 |
|
@@ -107,7 +107,7 @@ Table 1: Detailed down-stream tasks comparisons. MMLU task performance is report
|
|
107 |
- Lora r: 128
|
108 |
- Lora alpha: 16
|
109 |
- Beta: 0.05
|
110 |
-
- Learn more: [BTLM-3B-8k-chat blog](
|
111 |
|
112 |
|
113 |
## Uses and Limitations
|
|
|
16 |
|
17 |
# BTLM-3B-8k-chat
|
18 |
|
19 |
+
BTLM-3B-8k-chat is a chat version of the [BTLM-3B-8K-base](cerebras/btlm-3b-8k-base) model trained using [DPO](https://arxiv.org/abs/2305.18290) method on [Anthropic-HH-RLHF](Anthropic/hh-rlhf) dataset. The model was specifically trained to align to human preferences and optimized for dialogue use cases.
|
20 |
|
21 |
|
22 |
|
|
|
107 |
- Lora r: 128
|
108 |
- Lora alpha: 16
|
109 |
- Beta: 0.05
|
110 |
+
- Learn more: [BTLM-3B-8k-chat blog](https://www.cerebras.net/blog/fine-tuning-language-models-using-direct-preference-optimization)
|
111 |
|
112 |
|
113 |
## Uses and Limitations
|