seonghyeonye
commited on
Commit
•
5083889
1
Parent(s):
bca2369
Update README.md
Browse files
README.md
CHANGED
@@ -31,9 +31,9 @@ FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a
|
|
31 |
At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
|
32 |
Training details:
|
33 |
- Fine-tuning steps: 5'000
|
34 |
-
- Input sequence length:
|
35 |
-
- Target sequence length:
|
36 |
-
- Batch size:
|
37 |
- Optimizer: Adafactor
|
38 |
- Learning rate: 5e-5
|
39 |
- Dropout: 0.1
|
|
|
31 |
At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
|
32 |
Training details:
|
33 |
- Fine-tuning steps: 5'000
|
34 |
+
- Input sequence length: 512
|
35 |
+
- Target sequence length: 128
|
36 |
+
- Batch size: 240
|
37 |
- Optimizer: Adafactor
|
38 |
- Learning rate: 5e-5
|
39 |
- Dropout: 0.1
|