seonghyeonye
commited on
Commit
·
6d785d6
1
Parent(s):
77a37a0
Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
**Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
|
2 |
# Model Description
|
3 |
FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
|
4 |
-
It is a series of encoder-decoder model trained on a numerous classification dataset. We show inputs and its corresponding outputs of each instances in each dataset to FLIPPED, and train it to generate its possible instruction. We add
|
5 |
# Intended uses
|
6 |
You can use the models to perform inference on tasks by specifying your input-output NLP query in a "input: {input}\noutput: {output}" form , and the model will predict the instruction. For example, You can try
|
7 |
*"input: <extra_id_0> this is the best cast iron skillet you will ever buy<extra_id_1>\noutput: Positive"*
|
@@ -28,12 +28,12 @@ We also provide a quick [Jupyter Notebook](https://github.com/seonghyeonye/Flipp
|
|
28 |
|
29 |
# Training procedure
|
30 |
FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
|
31 |
-
At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an
|
32 |
Training details:
|
33 |
- Fine-tuning steps: 5'000
|
34 |
-
- Input sequence length: 384
|
35 |
- Target sequence length: 64
|
36 |
-
- Batch size:
|
37 |
- Optimizer: Adafactor
|
38 |
- Learning rate: 5e-5
|
39 |
- Dropout: 0.1
|
@@ -82,14 +82,10 @@ We evaluate the robustness of models on following datasets with changing the out
|
|
82 |
The template name we used can be found in the [promptsource template library](https://github.com/bigscience-workshop/promptsource/tree/main/promptsource/templates).
|
83 |
# BibTeX entry and citation info
|
84 |
```bibtex
|
85 |
-
@
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
title = {Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners},
|
91 |
-
publisher = {arXiv},
|
92 |
-
year = {2022},
|
93 |
-
copyright = {Creative Commons Attribution 4.0 International}
|
94 |
}
|
95 |
```
|
|
|
1 |
**Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
|
2 |
# Model Description
|
3 |
FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
|
4 |
+
It is a series of encoder-decoder model trained on a numerous classification dataset. We show inputs and its corresponding outputs of each instances in each dataset to FLIPPED, and train it to generate its possible instruction. We add unlikelihood loss in order **not** to generate the instruction when given the same input, but a wrong output. To obtain FLIPPED, we fine-tune a T5 model in a given scale on a multitask mixture covering many different classification NLP tasks.
|
5 |
# Intended uses
|
6 |
You can use the models to perform inference on tasks by specifying your input-output NLP query in a "input: {input}\noutput: {output}" form , and the model will predict the instruction. For example, You can try
|
7 |
*"input: <extra_id_0> this is the best cast iron skillet you will ever buy<extra_id_1>\noutput: Positive"*
|
|
|
28 |
|
29 |
# Training procedure
|
30 |
FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
|
31 |
+
At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
|
32 |
Training details:
|
33 |
- Fine-tuning steps: 5'000
|
34 |
+
- Input sequence length: 384
|
35 |
- Target sequence length: 64
|
36 |
+
- Batch size: 240
|
37 |
- Optimizer: Adafactor
|
38 |
- Learning rate: 5e-5
|
39 |
- Dropout: 0.1
|
|
|
82 |
The template name we used can be found in the [promptsource template library](https://github.com/bigscience-workshop/promptsource/tree/main/promptsource/templates).
|
83 |
# BibTeX entry and citation info
|
84 |
```bibtex
|
85 |
+
@article{ye2022guess,
|
86 |
+
title={Guess the Instruction! Making Language Models Stronger Zero-Shot Learners},
|
87 |
+
author={Ye, Seonghyeon and Kim, Doyoung and Jang, Joel and Shin, Joongbo and Seo, Minjoon},
|
88 |
+
journal={arXiv preprint arXiv:2210.02969},
|
89 |
+
year={2022}
|
|
|
|
|
|
|
|
|
90 |
}
|
91 |
```
|