metadata

license: apache-2.0
base_model: google/flan-t5-large
tags:
  - generated_from_trainer
model-index:
  - name: Prompting-NLP-Paper-to-QA-Generation-abstract-only
    results: []
widget:
  - text: |-
      Make Question-Answer pair correspond to the following research paper.
       [Abstract] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 
       Question, Answer:
    example_title: Attention Is All You Need
  - text: |-
      Make Question-Answer pair correspond to the following research paper.
       [Abstract] In this work, we explore prompt tuning, a simple yet effective mechanism for learning soft prompts to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's few-shot learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method closes the gap and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed prefix tuning of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning. 
       Question, Answer:
    example_title: '2104.08691'

Prompting-NLP-Paper-to-QA-Generation-abstract-only

This model is a fine-tuned version of google/flan-t5-large on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 21.0330

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 184
num_epochs: 15

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.99	46	42.8265
36.8265	1.99	92	41.8626
36.8265	2.98	138	39.9479
35.1011	3.97	184	37.2276
35.1011	4.97	230	33.5552
28.7673	5.96	276	25.3570
28.7673	6.95	322	22.8463
20.3737	7.95	368	22.0063
20.3737	8.94	414	21.5694
19.2477	9.93	460	21.3303
19.2477	10.93	506	21.1698
18.9724	11.92	552	21.0922
18.9724	12.91	598	21.0487
18.9072	13.91	644	21.0365
18.9072	14.9	690	21.0330

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu118
Datasets 2.15.0
Tokenizers 0.15.0