metadata
license: apache-2.0
base_model: google/flan-t5-large
tags:
- generated_from_trainer
model-index:
- name: Prompting-NLP-Paper-to-QA-Generation-abstract-only
results: []
widget:
- text: >-
Make Question-Answer pair correspond to the following research paper.
[Abstract] The dominant sequence transduction models are based on complex
recurrent or convolutional neural networks in an encoder-decoder
configuration. The best performing models also connect the encoder and
decoder through an attention mechanism. We propose a new simple network
architecture, the Transformer, based solely on attention mechanisms,
dispensing with recurrence and convolutions entirely. Experiments on two
machine translation tasks show these models to be superior in quality
while being more parallelizable and requiring significantly less time to
train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German
translation task, improving over the existing best results, including
ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation
task, our model establishes a new single-model state-of-the-art BLEU score
of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the
training costs of the best models from the literature. We show that the
Transformer generalizes well to other tasks by applying it successfully to
English constituency parsing both with large and limited training data.
Question, Answer:
- example_title: 'Paper: Attention Is All You Need '
Prompting-NLP-Paper-to-QA-Generation-abstract-only
This model is a fine-tuned version of google/flan-t5-large on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 21.0330
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 184
- num_epochs: 15
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 0.99 | 46 | 42.8265 |
36.8265 | 1.99 | 92 | 41.8626 |
36.8265 | 2.98 | 138 | 39.9479 |
35.1011 | 3.97 | 184 | 37.2276 |
35.1011 | 4.97 | 230 | 33.5552 |
28.7673 | 5.96 | 276 | 25.3570 |
28.7673 | 6.95 | 322 | 22.8463 |
20.3737 | 7.95 | 368 | 22.0063 |
20.3737 | 8.94 | 414 | 21.5694 |
19.2477 | 9.93 | 460 | 21.3303 |
19.2477 | 10.93 | 506 | 21.1698 |
18.9724 | 11.92 | 552 | 21.0922 |
18.9724 | 12.91 | 598 | 21.0487 |
18.9072 | 13.91 | 644 | 21.0365 |
18.9072 | 14.9 | 690 | 21.0330 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.0+cu118
- Datasets 2.15.0
- Tokenizers 0.15.0