Text Generation
Transformers
PyTorch
English
gptj
Inference Endpoints
juewang commited on
Commit
eb35ad4
1 Parent(s): 92cffdb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -82,7 +82,7 @@ widget:
82
 
83
  We incorporated a collection of open techniques and datasets to build GPT-JT:
84
  - GPT-JT was trained based on [GPT-J (6B)](https://huggingface.co/EleutherAI/gpt-j-6B), created by [EleutherAI](https://www.eleuther.ai);
85
- - We used [UL2](https://github.com/google-research/google-research/tree/master/ul2)'s training objective, which allows it to use bidirectional context to process the prompt;
86
  - The model was trained on a large collection of diverse data, including [Chain-of-Thought (CoT)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html), [Public Pool of Prompts (P3) dataset](https://huggingface.co/datasets/bigscience/P3), [Natural-Instructions (NI) dataset](https://github.com/allenai/natural-instructions).
87
 
88
  With the help of techniques mentioned above, GPT-JT significantly improves the performance of classification tasks over the original GPT-J, and even outperforms most 100B+ parameter models!
 
82
 
83
  We incorporated a collection of open techniques and datasets to build GPT-JT:
84
  - GPT-JT was trained based on [GPT-J (6B)](https://huggingface.co/EleutherAI/gpt-j-6B), created by [EleutherAI](https://www.eleuther.ai);
85
+ - We used [UL2](https://github.com/google-research/google-research/tree/master/ul2)'s training objective, which allows the model to use bidirectional context to process the prompt;
86
  - The model was trained on a large collection of diverse data, including [Chain-of-Thought (CoT)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html), [Public Pool of Prompts (P3) dataset](https://huggingface.co/datasets/bigscience/P3), [Natural-Instructions (NI) dataset](https://github.com/allenai/natural-instructions).
87
 
88
  With the help of techniques mentioned above, GPT-JT significantly improves the performance of classification tasks over the original GPT-J, and even outperforms most 100B+ parameter models!