Update README.md
Browse files
README.md
CHANGED
@@ -132,12 +132,7 @@ $$
|
|
132 |
\end{bmatrix}
|
133 |
$$
|
134 |
|
135 |
-
Furthermore, we leverage a large collection of data, including
|
136 |
-
- [Natural-Instructions](https://github.com/allenai/natural-instructions)
|
137 |
-
- [P3](https://huggingface.co/datasets/Muennighoff/P3)
|
138 |
-
- [MMLU-COT](https://github.com/jasonwei20/flan-2/blob/main/mmlu-cot.json)
|
139 |
-
- [the Pile](https://huggingface.co/datasets/the_pile)
|
140 |
-
|
141 |
Specifically, we first conduct training for 2.62 billion tokens using the UL2 loss on the Pile, followed by 0.92 billion tokens with a mixture of the above datasets: 5% of COT, 20% of P3, 20% of NI, and 55% of the Pile.
|
142 |
|
143 |
## Hyperparameters
|
|
|
132 |
\end{bmatrix}
|
133 |
$$
|
134 |
|
135 |
+
Furthermore, we leverage a large collection of data, including [Natural-Instructions](https://github.com/allenai/natural-instructions), [P3](https://huggingface.co/datasets/Muennighoff/P3), [MMLU-COT](https://github.com/jasonwei20/flan-2/blob/main/mmlu-cot.json), and [the Pile](https://huggingface.co/datasets/the_pile)
|
|
|
|
|
|
|
|
|
|
|
136 |
Specifically, we first conduct training for 2.62 billion tokens using the UL2 loss on the Pile, followed by 0.92 billion tokens with a mixture of the above datasets: 5% of COT, 20% of P3, 20% of NI, and 55% of the Pile.
|
137 |
|
138 |
## Hyperparameters
|