Update README pt2
Browse files
README.md
CHANGED
@@ -4,9 +4,11 @@ license: apache-2.0
|
|
4 |
|
5 |
## Introduction
|
6 |
|
7 |
-
Cerebrum 7b is a large language model (LLM) created specifically for reasoning tasks. It is
|
8 |
|
9 |
-
|
|
|
|
|
10 |
|
11 |
## Benchmarking
|
12 |
An overview of Cerebrum 7b performance compared to reported performance Mistral 7b and LLama 2 70b on selected benchmarks that require reasoning:
|
@@ -16,7 +18,7 @@ An overview of Cerebrum 7b performance compared to reported performance Mistral
|
|
16 |
Notes: 1) Cerebrum evaluated zero-shot, Mistral 8-shot with maj@8, Llama 8-shot; 2) Cerebrum evaluated zero-shot, Mistral 4-shot with maj@4, Llama 4-shot
|
17 |
|
18 |
## Usage
|
19 |
-
For optimal performance, Cerebrum should be prompted with an Alpaca-style template that requests
|
20 |
```
|
21 |
<s>A chat between a user and a thinking artificial intelligence assistant. The assistant describes its thought process and gives helpful and detailed answers to the user's questions.
|
22 |
User: Are you conscious?
|
@@ -38,6 +40,6 @@ with torch.no_grad():
|
|
38 |
# will generate "Chain of thought prompting works because it helps the model to break down complex problems into smaller, more manageable steps. This allows the model to focus on each step individually and to generate more accurate and relevant responses. Additionally, the intermediate steps can help the model to understand the problem better and to find patterns or connections that it may not have seen before.</s>"
|
39 |
```
|
40 |
|
41 |
-
The model ends its turn by generating the
|
42 |
|
43 |
-
Cerebrum can be operated at very low temperatures (and specifically temperature 0), which improves performance on tasks that require precise answers. The alignment should be sufficient to avoid repetitions in most cases
|
|
|
4 |
|
5 |
## Introduction
|
6 |
|
7 |
+
Cerebrum 7b is a large language model (LLM) created specifically for reasoning tasks. It is based on the Mistral 7b model, fine-tuned on a small custom dataset of native chain of thought data and further improved with targeted RLHF (tRLHF), a novel technique for sample-efficient LLM alignment. Unlike numerous other recent fine-tuning approaches, our training pipeline includes under 5000 training prompts and even fewer labeled datapoints for tRLHF.
|
8 |
|
9 |
+
Native chain of thought approach means that Cerebrum is trained to devise a tactical plan before tackling problems that require thinking. For brainstorming, knowledge intensive, and creative tasks Cerebrum will typically omit unnecessarily verbose considerations.
|
10 |
+
|
11 |
+
Zero-shot prompted Cerebrum significantly outperforms few-shot prompted Mistral 7b as well as much larger models (such as Llama 2 70b) on a range of tasks that require reasoning, including ARC Challenge, GSM8k, and Math.
|
12 |
|
13 |
## Benchmarking
|
14 |
An overview of Cerebrum 7b performance compared to reported performance Mistral 7b and LLama 2 70b on selected benchmarks that require reasoning:
|
|
|
18 |
Notes: 1) Cerebrum evaluated zero-shot, Mistral 8-shot with maj@8, Llama 8-shot; 2) Cerebrum evaluated zero-shot, Mistral 4-shot with maj@4, Llama 4-shot
|
19 |
|
20 |
## Usage
|
21 |
+
For optimal performance, Cerebrum should be prompted with an Alpaca-style template that requests the description of the "thought process". Here is what a conversation should look like from the model's point of view:
|
22 |
```
|
23 |
<s>A chat between a user and a thinking artificial intelligence assistant. The assistant describes its thought process and gives helpful and detailed answers to the user's questions.
|
24 |
User: Are you conscious?
|
|
|
40 |
# will generate "Chain of thought prompting works because it helps the model to break down complex problems into smaller, more manageable steps. This allows the model to focus on each step individually and to generate more accurate and relevant responses. Additionally, the intermediate steps can help the model to understand the problem better and to find patterns or connections that it may not have seen before.</s>"
|
41 |
```
|
42 |
|
43 |
+
The model ends its turn by generating the EOS token. Importantly, this token should be removed from the model answer in a multi-turn dialogue.
|
44 |
|
45 |
+
Cerebrum can be operated at very low temperatures (and specifically temperature 0), which improves performance on tasks that require precise answers. The alignment should be sufficient to avoid repetitions in most cases without a repetition penalty.
|