Update README.md
Browse files
README.md
CHANGED
@@ -28,9 +28,9 @@ Inspired by and featuring the Reflection Tuning technique pioneered by Matt Shum
|
|
28 |
|
29 |
**As per the inspiring model "mattshumer/Reflection-Llama-3.1-70B" (this mode was not used in the training process nor as a foundational model, but only served as inspiration) :**
|
30 |
|
31 |
-
|
32 |
|
33 |
-
During sampling, the model will start by outputting reasoning inside
|
34 |
|
35 |
This enables the model to separate its internal thoughts and reasoning from its final answer, improving the experience for the user.
|
36 |
|
@@ -52,7 +52,7 @@ You are a world-class AI system, capable of complex reasoning and reflection. Re
|
|
52 |
|
53 |
what is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
54 |
|
55 |
-
|
56 |
|
57 |
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
58 |
|
|
|
28 |
|
29 |
**As per the inspiring model "mattshumer/Reflection-Llama-3.1-70B" (this mode was not used in the training process nor as a foundational model, but only served as inspiration) :**
|
30 |
|
31 |
+
```
|
32 |
|
33 |
+
During sampling, the model will start by outputting reasoning inside <thinking> and </thinking> tags, and then once it is satisfied with its reasoning, it will output the final answer inside <output> and </output> tags. Each of these tags are special tokens, trained into the model.
|
34 |
|
35 |
This enables the model to separate its internal thoughts and reasoning from its final answer, improving the experience for the user.
|
36 |
|
|
|
52 |
|
53 |
what is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
54 |
|
55 |
+
```
|
56 |
|
57 |
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
58 |
|