UNIST-Eunchan's picture
Update README.md
305cf58
metadata
license: apache-2.0
base_model: google/flan-t5-large
tags:
  - generated_from_trainer
model-index:
  - name: Prompting-NLP-Paper-to-QA-Generation-abstract-only
    results: []
widget:
  - text: |-
      Make Question-Answer pair correspond to the following research paper.
       [Abstract] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 
       Question, Answer:
    example_title: Attention Is All You Need
  - text: |-
      Make Question-Answer pair correspond to the following research paper.
       [Abstract] In this work, we explore prompt tuning, a simple yet effective mechanism for learning soft prompts to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's few-shot learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method closes the gap and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed prefix tuning of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning. 
       Question, Answer:
    example_title: '2104.08691'

Prompting-NLP-Paper-to-QA-Generation-abstract-only

This model is a fine-tuned version of google/flan-t5-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 21.0330

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 184
  • num_epochs: 15

Training results

Training Loss Epoch Step Validation Loss
No log 0.99 46 42.8265
36.8265 1.99 92 41.8626
36.8265 2.98 138 39.9479
35.1011 3.97 184 37.2276
35.1011 4.97 230 33.5552
28.7673 5.96 276 25.3570
28.7673 6.95 322 22.8463
20.3737 7.95 368 22.0063
20.3737 8.94 414 21.5694
19.2477 9.93 460 21.3303
19.2477 10.93 506 21.1698
18.9724 11.92 552 21.0922
18.9724 12.91 598 21.0487
18.9072 13.91 644 21.0365
18.9072 14.9 690 21.0330

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0