togethercomputer
/

GPT-JT-6B-v1

Text Generation

Inference Endpoints

Model card Files Files and versions Community

juewang commited on Nov 28, 2022

Commit

3becaca

•

1 Parent(s): 9cbcd8e

Update README.md

Files changed (1) hide show

README.md +1 -6

README.md CHANGED Viewed

@@ -132,12 +132,7 @@ $$
 \end{bmatrix}
 $$
-Furthermore, we leverage a large collection of data, including NI, P3, COT, the pile:
-- [Natural-Instructions](https://github.com/allenai/natural-instructions)
-- [P3](https://huggingface.co/datasets/Muennighoff/P3)
-- [MMLU-COT](https://github.com/jasonwei20/flan-2/blob/main/mmlu-cot.json)
-- [the Pile](https://huggingface.co/datasets/the_pile)
 Specifically, we first conduct training for 2.62 billion tokens using the UL2 loss on the Pile, followed by 0.92 billion tokens with a mixture of the above datasets: 5% of COT, 20% of P3, 20% of NI, and 55% of the Pile.
 ## Hyperparameters

 \end{bmatrix}
 $$
+Furthermore, we leverage a large collection of data, including [Natural-Instructions](https://github.com/allenai/natural-instructions), [P3](https://huggingface.co/datasets/Muennighoff/P3), [MMLU-COT](https://github.com/jasonwei20/flan-2/blob/main/mmlu-cot.json), and [the Pile](https://huggingface.co/datasets/the_pile)
 Specifically, we first conduct training for 2.62 billion tokens using the UL2 loss on the Pile, followed by 0.92 billion tokens with a mixture of the above datasets: 5% of COT, 20% of P3, 20% of NI, and 55% of the Pile.
 ## Hyperparameters