UNIST-Eunchan's picture
Update README.md
8819dec
|
raw
history blame
No virus
3.48 kB
metadata
license: apache-2.0
base_model: google/flan-t5-large
tags:
  - generated_from_trainer
model-index:
  - name: Prompting-NLP-Paper-to-QA-Generation-abstract-only
    results: []
widget:
  - text: >-
      Make Question-Answer pair correspond to the following research paper.
      [Abstract] The dominant sequence transduction models are based on complex
      recurrent or convolutional neural networks in an encoder-decoder
      configuration. The best performing models also connect the encoder and
      decoder through an attention mechanism. We propose a new simple network
      architecture, the Transformer, based solely on attention mechanisms,
      dispensing with recurrence and convolutions entirely. Experiments on two
      machine translation tasks show these models to be superior in quality
      while being more parallelizable and requiring significantly less time to
      train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German
      translation task, improving over the existing best results, including
      ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation
      task, our model establishes a new single-model state-of-the-art BLEU score
      of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the
      training costs of the best models from the literature. We show that the
      Transformer generalizes well to other tasks by applying it successfully to
      English constituency parsing both with large and limited training data.
      Question, Answer:
  - example_title: 'Paper: Attention Is All You Need '

Prompting-NLP-Paper-to-QA-Generation-abstract-only

This model is a fine-tuned version of google/flan-t5-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 21.0330

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 184
  • num_epochs: 15

Training results

Training Loss Epoch Step Validation Loss
No log 0.99 46 42.8265
36.8265 1.99 92 41.8626
36.8265 2.98 138 39.9479
35.1011 3.97 184 37.2276
35.1011 4.97 230 33.5552
28.7673 5.96 276 25.3570
28.7673 6.95 322 22.8463
20.3737 7.95 368 22.0063
20.3737 8.94 414 21.5694
19.2477 9.93 460 21.3303
19.2477 10.93 506 21.1698
18.9724 11.92 552 21.0922
18.9724 12.91 598 21.0487
18.9072 13.91 644 21.0365
18.9072 14.9 690 21.0330

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0