Text Generation
Transformers
PyTorch
English
gptj
Inference Endpoints
juewang commited on
Commit
3becaca
1 Parent(s): 9cbcd8e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -6
README.md CHANGED
@@ -132,12 +132,7 @@ $$
132
  \end{bmatrix}
133
  $$
134
 
135
- Furthermore, we leverage a large collection of data, including NI, P3, COT, the pile:
136
- - [Natural-Instructions](https://github.com/allenai/natural-instructions)
137
- - [P3](https://huggingface.co/datasets/Muennighoff/P3)
138
- - [MMLU-COT](https://github.com/jasonwei20/flan-2/blob/main/mmlu-cot.json)
139
- - [the Pile](https://huggingface.co/datasets/the_pile)
140
-
141
  Specifically, we first conduct training for 2.62 billion tokens using the UL2 loss on the Pile, followed by 0.92 billion tokens with a mixture of the above datasets: 5% of COT, 20% of P3, 20% of NI, and 55% of the Pile.
142
 
143
  ## Hyperparameters
 
132
  \end{bmatrix}
133
  $$
134
 
135
+ Furthermore, we leverage a large collection of data, including [Natural-Instructions](https://github.com/allenai/natural-instructions), [P3](https://huggingface.co/datasets/Muennighoff/P3), [MMLU-COT](https://github.com/jasonwei20/flan-2/blob/main/mmlu-cot.json), and [the Pile](https://huggingface.co/datasets/the_pile)
 
 
 
 
 
136
  Specifically, we first conduct training for 2.62 billion tokens using the UL2 loss on the Pile, followed by 0.92 billion tokens with a mixture of the above datasets: 5% of COT, 20% of P3, 20% of NI, and 55% of the Pile.
137
 
138
  ## Hyperparameters