File size: 4,841 Bytes
12de298
 
 
 
 
 
 
 
8819dec
c7eceeb
305cf58
c7eceeb
305cf58
c7eceeb
8819dec
12de298
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
license: apache-2.0
base_model: google/flan-t5-large
tags:
- generated_from_trainer
model-index:
- name: Prompting-NLP-Paper-to-QA-Generation-abstract-only
  results: []
widget:
- text: "Make Question-Answer pair correspond to the following research paper.\n [Abstract] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. \n Question, Answer:"
  example_title: "Attention Is All You Need"
- text: "Make Question-Answer pair correspond to the following research paper.\n [Abstract] In this work, we explore prompt tuning, a simple yet effective mechanism for learning soft prompts to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's few-shot learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method closes the gap and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed prefix tuning of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning. \n Question, Answer:"
  example_title: "2104.08691"


---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Prompting-NLP-Paper-to-QA-Generation-abstract-only

This model is a fine-tuned version of [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 21.0330

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 184
- num_epochs: 15

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| No log        | 0.99  | 46   | 42.8265         |
| 36.8265       | 1.99  | 92   | 41.8626         |
| 36.8265       | 2.98  | 138  | 39.9479         |
| 35.1011       | 3.97  | 184  | 37.2276         |
| 35.1011       | 4.97  | 230  | 33.5552         |
| 28.7673       | 5.96  | 276  | 25.3570         |
| 28.7673       | 6.95  | 322  | 22.8463         |
| 20.3737       | 7.95  | 368  | 22.0063         |
| 20.3737       | 8.94  | 414  | 21.5694         |
| 19.2477       | 9.93  | 460  | 21.3303         |
| 19.2477       | 10.93 | 506  | 21.1698         |
| 18.9724       | 11.92 | 552  | 21.0922         |
| 18.9724       | 12.91 | 598  | 21.0487         |
| 18.9072       | 13.91 | 644  | 21.0365         |
| 18.9072       | 14.9  | 690  | 21.0330         |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.0+cu118
- Datasets 2.15.0
- Tokenizers 0.15.0