seonghyeonye commited on
Commit
5083889
1 Parent(s): bca2369

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -31,9 +31,9 @@ FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a
31
  At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
32
  Training details:
33
  - Fine-tuning steps: 5'000
34
- - Input sequence length: 384(512 for 3B)
35
- - Target sequence length: 64
36
- - Batch size: 1
37
  - Optimizer: Adafactor
38
  - Learning rate: 5e-5
39
  - Dropout: 0.1
 
31
  At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
32
  Training details:
33
  - Fine-tuning steps: 5'000
34
+ - Input sequence length: 512
35
+ - Target sequence length: 128
36
+ - Batch size: 240
37
  - Optimizer: Adafactor
38
  - Learning rate: 5e-5
39
  - Dropout: 0.1