Update README.md
Browse files
README.md
CHANGED
@@ -42,6 +42,7 @@ datasets:
|
|
42 |
- WhiteRabbitNeo/WRN-Chapter-1
|
43 |
- WhiteRabbitNeo/WRN-Chapter-2
|
44 |
- winogrande
|
|
|
45 |
|
46 |
# A bagel, with everything (except DPO)
|
47 |
|
@@ -63,6 +64,7 @@ __*Only train splits are used, and a decontamination by cosine similarity is per
|
|
63 |
|
64 |
<details>
|
65 |
<summary>SFT data sources</summary>
|
|
|
66 |
- [ai2_arc](https://huggingface.co/datasets/ai2_arc)
|
67 |
- Abstraction and reasoning dataset, useful in measuring "intelligence" to a certain extent.
|
68 |
- [airoboros](https://huggingface.co/datasets/unalignment/spicy-3.1)
|
@@ -135,6 +137,7 @@ __*Only train splits are used, and a decontamination by cosine similarity is per
|
|
135 |
|
136 |
<details>
|
137 |
<summary>DPO data sources</summary>
|
|
|
138 |
- [airoboros 3.2](https://huggingface.co/datasets/jondurbin/airoboros-3.2) vs [airoboros m2.0](https://huggingface.co/datasets/jondurbin/airoboros-gpt4-m2.0)
|
139 |
- The creative/writing tasks from airoboros-2.2.1 were re-generated using gpt4-0314 and a custom prompt to get longer, more creative, less clichè responses for airoboros 3.1, so we can use the shorter/boring version as the "rejected" value and the rerolled response as "chosen"
|
140 |
- [contextual-dpo](https://huggingface.co/datasets/jondurbin/contextual-dpo-v0.1)
|
@@ -162,6 +165,18 @@ I also didn't want to randomly select a single prompt format for each item (hopi
|
|
162 |
|
163 |
This means each epoch of our fine-tune is the equivalent of 3 epochs.
|
164 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
165 |
<details>
|
166 |
<summary>Alpaca (sort of)</summary>
|
167 |
|
@@ -179,7 +194,8 @@ This means each epoch of our fine-tune is the equivalent of 3 epochs.
|
|
179 |
|
180 |
The main difference here is that because of the dataset formatting and variety of data sources, it would have been much to tedious to add an `### Input:` block, so the inputs are just in the instruction section.
|
181 |
|
182 |
-
|
|
|
183 |
|
184 |
```
|
185 |
{system prompt, if provided, randomly defaulting to "A chat between a user and an unbiased, uncensored assistant."}
|
@@ -188,20 +204,12 @@ This means each epoch of our fine-tune is the equivalent of 3 epochs.
|
|
188 |
```
|
189 |
</details>
|
190 |
|
191 |
-
|
192 |
-
|
193 |
-
```text
|
194 |
-
{bos}<|im_start|>{role}
|
195 |
-
{text}
|
196 |
-
<|im_end|>{eos}
|
197 |
-
```
|
198 |
-
|
199 |
-
### Llama-2 chat (recommended)
|
200 |
-
|
201 |
-
```
|
202 |
-
[INST] <<SYS>>
|
203 |
-
{system}
|
204 |
-
<</SYS>>
|
205 |
|
206 |
-
|
207 |
-
|
|
|
|
|
|
|
|
|
|
42 |
- WhiteRabbitNeo/WRN-Chapter-1
|
43 |
- WhiteRabbitNeo/WRN-Chapter-2
|
44 |
- winogrande
|
45 |
+
---
|
46 |
|
47 |
# A bagel, with everything (except DPO)
|
48 |
|
|
|
64 |
|
65 |
<details>
|
66 |
<summary>SFT data sources</summary>
|
67 |
+
|
68 |
- [ai2_arc](https://huggingface.co/datasets/ai2_arc)
|
69 |
- Abstraction and reasoning dataset, useful in measuring "intelligence" to a certain extent.
|
70 |
- [airoboros](https://huggingface.co/datasets/unalignment/spicy-3.1)
|
|
|
137 |
|
138 |
<details>
|
139 |
<summary>DPO data sources</summary>
|
140 |
+
|
141 |
- [airoboros 3.2](https://huggingface.co/datasets/jondurbin/airoboros-3.2) vs [airoboros m2.0](https://huggingface.co/datasets/jondurbin/airoboros-gpt4-m2.0)
|
142 |
- The creative/writing tasks from airoboros-2.2.1 were re-generated using gpt4-0314 and a custom prompt to get longer, more creative, less clichè responses for airoboros 3.1, so we can use the shorter/boring version as the "rejected" value and the rerolled response as "chosen"
|
143 |
- [contextual-dpo](https://huggingface.co/datasets/jondurbin/contextual-dpo-v0.1)
|
|
|
165 |
|
166 |
This means each epoch of our fine-tune is the equivalent of 3 epochs.
|
167 |
|
168 |
+
<details>
|
169 |
+
<summary>Llama-2 chat (recommended)</summary>
|
170 |
+
|
171 |
+
```
|
172 |
+
[INST] <<SYS>>
|
173 |
+
{system}
|
174 |
+
<</SYS>>
|
175 |
+
|
176 |
+
{instruction} [/INST]
|
177 |
+
```
|
178 |
+
</details>
|
179 |
+
|
180 |
<details>
|
181 |
<summary>Alpaca (sort of)</summary>
|
182 |
|
|
|
194 |
|
195 |
The main difference here is that because of the dataset formatting and variety of data sources, it would have been much to tedious to add an `### Input:` block, so the inputs are just in the instruction section.
|
196 |
|
197 |
+
<details>
|
198 |
+
<summary>Vicuna</summary>
|
199 |
|
200 |
```
|
201 |
{system prompt, if provided, randomly defaulting to "A chat between a user and an unbiased, uncensored assistant."}
|
|
|
204 |
```
|
205 |
</details>
|
206 |
|
207 |
+
<details>
|
208 |
+
<summary>ChatML</summary>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
209 |
|
210 |
+
```text
|
211 |
+
{bos}<|im_start|>{role}
|
212 |
+
{text}
|
213 |
+
<|im_end|>{eos}
|
214 |
+
```
|
215 |
+
</details>
|