metadata

license: apache-2.0
base_model: google/flan-t5-large
tags:
  - generated_from_trainer
model-index:
  - name: Prompting-NLP-Paper-to-QA-Generation-abstract-only
    results: []
widget:
  - text: >-
      Make Question-Answer pair correspond to the following research paper.
      [Abstract] The dominant sequence transduction models are based on complex
      recurrent or convolutional neural networks in an encoder-decoder
      configuration. The best performing models also connect the encoder and
      decoder through an attention mechanism. We propose a new simple network
      architecture, the Transformer, based solely on attention mechanisms,
      dispensing with recurrence and convolutions entirely. Experiments on two
      machine translation tasks show these models to be superior in quality
      while being more parallelizable and requiring significantly less time to
      train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German
      translation task, improving over the existing best results, including
      ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation
      task, our model establishes a new single-model state-of-the-art BLEU score
      of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the
      training costs of the best models from the literature. We show that the
      Transformer generalizes well to other tasks by applying it successfully to
      English constituency parsing both with large and limited training data.
      Question, Answer:
  - example_title: 'Paper: Attention Is All You Need '

Prompting-NLP-Paper-to-QA-Generation-abstract-only

This model is a fine-tuned version of google/flan-t5-large on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 21.0330

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 184
num_epochs: 15

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.99	46	42.8265
36.8265	1.99	92	41.8626
36.8265	2.98	138	39.9479
35.1011	3.97	184	37.2276
35.1011	4.97	230	33.5552
28.7673	5.96	276	25.3570
28.7673	6.95	322	22.8463
20.3737	7.95	368	22.0063
20.3737	8.94	414	21.5694
19.2477	9.93	460	21.3303
19.2477	10.93	506	21.1698
18.9724	11.92	552	21.0922
18.9724	12.91	598	21.0487
18.9072	13.91	644	21.0365
18.9072	14.9	690	21.0330

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu118
Datasets 2.15.0
Tokenizers 0.15.0