jondurbin
/

bagel-7b-v0.4

@@ -42,6 +42,7 @@ datasets:
 - WhiteRabbitNeo/WRN-Chapter-1
 - WhiteRabbitNeo/WRN-Chapter-2
 - winogrande
 # A bagel, with everything (except DPO)
@@ -63,6 +64,7 @@ __*Only train splits are used, and a decontamination by cosine similarity is per
 <details>
   <summary>SFT data sources</summary>
   - [ai2_arc](https://huggingface.co/datasets/ai2_arc)
     - Abstraction and reasoning dataset, useful in measuring "intelligence" to a certain extent.
   - [airoboros](https://huggingface.co/datasets/unalignment/spicy-3.1)
@@ -135,6 +137,7 @@ __*Only train splits are used, and a decontamination by cosine similarity is per
 <details>
   <summary>DPO data sources</summary>
   - [airoboros 3.2](https://huggingface.co/datasets/jondurbin/airoboros-3.2) vs [airoboros m2.0](https://huggingface.co/datasets/jondurbin/airoboros-gpt4-m2.0)
     - The creative/writing tasks from airoboros-2.2.1 were re-generated using gpt4-0314 and a custom prompt to get longer, more creative, less clichè responses for airoboros 3.1, so we can use the shorter/boring version as the "rejected" value and the rerolled response as "chosen"
   - [contextual-dpo](https://huggingface.co/datasets/jondurbin/contextual-dpo-v0.1)
@@ -162,6 +165,18 @@ I also didn't want to randomly select a single prompt format for each item (hopi
 This means each epoch of our fine-tune is the equivalent of 3 epochs.
 <details>
   <summary>Alpaca (sort of)</summary>
@@ -179,7 +194,8 @@ This means each epoch of our fine-tune is the equivalent of 3 epochs.
   The main difference here is that because of the dataset formatting and variety of data sources, it would have been much to tedious to add an `### Input:` block, so the inputs are just in the instruction section.
-  ### Vicuna
   ```
   {system prompt, if provided, randomly defaulting to "A chat between a user and an unbiased, uncensored assistant."}
@@ -188,20 +204,12 @@ This means each epoch of our fine-tune is the equivalent of 3 epochs.
   ```
 </details>
-### ChatML (sort of)
-```text
-{bos}<|im_start|>{role}
-{text}
-<|im_end|>{eos}
-```
-### Llama-2 chat (recommended)
-```
-[INST] <<SYS>>
-{system}
-<</SYS>>
-{instruction} [/INST]
-```

 - WhiteRabbitNeo/WRN-Chapter-1
 - WhiteRabbitNeo/WRN-Chapter-2
 - winogrande
+---
 # A bagel, with everything (except DPO)
 <details>
   <summary>SFT data sources</summary>
   - [ai2_arc](https://huggingface.co/datasets/ai2_arc)
     - Abstraction and reasoning dataset, useful in measuring "intelligence" to a certain extent.
   - [airoboros](https://huggingface.co/datasets/unalignment/spicy-3.1)
 <details>
   <summary>DPO data sources</summary>
   - [airoboros 3.2](https://huggingface.co/datasets/jondurbin/airoboros-3.2) vs [airoboros m2.0](https://huggingface.co/datasets/jondurbin/airoboros-gpt4-m2.0)
     - The creative/writing tasks from airoboros-2.2.1 were re-generated using gpt4-0314 and a custom prompt to get longer, more creative, less clichè responses for airoboros 3.1, so we can use the shorter/boring version as the "rejected" value and the rerolled response as "chosen"
   - [contextual-dpo](https://huggingface.co/datasets/jondurbin/contextual-dpo-v0.1)
 This means each epoch of our fine-tune is the equivalent of 3 epochs.
+<details>
+  <summary>Llama-2 chat (recommended)</summary>
+  ```
+  [INST] <<SYS>>
+  {system}
+  <</SYS>>
+  {instruction} [/INST]
+  ```
+</details>
 <details>
   <summary>Alpaca (sort of)</summary>
   The main difference here is that because of the dataset formatting and variety of data sources, it would have been much to tedious to add an `### Input:` block, so the inputs are just in the instruction section.
+<details>
+  <summary>Vicuna</summary>
   ```
   {system prompt, if provided, randomly defaulting to "A chat between a user and an unbiased, uncensored assistant."}
   ```
 </details>
+<details>
+  <summary>ChatML</summary>
+  ```text
+  {bos}<|im_start|>{role}
+  {text}
+  <|im_end|>{eos}
+  ```
+</details>