seonghyeonye
/

flipped_3B

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

seonghyeonye commited on Oct 12, 2022

Commit

5083889

•

1 Parent(s): bca2369

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -31,9 +31,9 @@ FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a
 At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
 Training details:
 - Fine-tuning steps: 5'000
-- Input sequence length: 384(512 for 3B)
-- Target sequence length: 64
-- Batch size: 1
 - Optimizer: Adafactor
 - Learning rate: 5e-5
 - Dropout: 0.1

 At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
 Training details:
 - Fine-tuning steps: 5'000
+- Input sequence length: 512
+- Target sequence length: 128
+- Batch size: 240
 - Optimizer: Adafactor
 - Learning rate: 5e-5
 - Dropout: 0.1