asahi417 commited on
Commit
bd7ce76
1 Parent(s): 6db56e0

model update

Browse files
Files changed (4) hide show
  1. README.md +153 -0
  2. config.json +1 -1
  3. pytorch_model.bin +2 -2
  4. tokenizer_config.json +1 -1
README.md ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ license: cc-by-4.0
4
+ metrics:
5
+ - bleu4
6
+ - meteor
7
+ - rouge-l
8
+ - bertscore
9
+ - moverscore
10
+ language: en
11
+ datasets:
12
+ - lmqg/qag_tweetqa
13
+ pipeline_tag: text2text-generation
14
+ tags:
15
+ - questions and answers generation
16
+ widget:
17
+ - text: "generate question and answer: Beyonce further expanded her acting career, starring as blues singer Etta James in the 2008 musical biopic, Cadillac Records."
18
+ example_title: "Questions & Answers Generation Example 1"
19
+ - text: "generate question and answer: Beyonce further expanded her acting career, starring as blues singer Etta James in the 2008 musical biopic, Cadillac Records ."
20
+ example_title: "Questions & Answers Generation Example 2"
21
+ - text: "generate question and answer: Beyonce further expanded her acting career, starring as blues singer Etta James in the 2008 musical biopic, Cadillac Records."
22
+ example_title: "Questions & Answers Generation Example 3"
23
+ model-index:
24
+ - name: lmqg/t5-base-tweetqa-qag
25
+ results:
26
+ - task:
27
+ name: Text2text Generation
28
+ type: text2text-generation
29
+ dataset:
30
+ name: lmqg/qag_tweetqa
31
+ type: default
32
+ args: default
33
+ metrics:
34
+ - name: BLEU4
35
+ type: bleu4
36
+ value: 8.760988157508385e-10
37
+ - name: ROUGE-L
38
+ type: rouge-l
39
+ value: 0.004925321600227637
40
+ - name: METEOR
41
+ type: meteor
42
+ value: 0.0028425982865562323
43
+ - name: BERTScore
44
+ type: bertscore
45
+ value: 0.03894749127879461
46
+ - name: MoverScore
47
+ type: moverscore
48
+ value: 0.45595221244168144
49
+ ---
50
+
51
+ # Model Card of `lmqg/t5-base-tweetqa-qag`
52
+ This model is fine-tuned version of [t5-base](https://huggingface.co/t5-base) for question generation task on the
53
+ [lmqg/qag_tweetqa](https://huggingface.co/datasets/lmqg/qag_tweetqa) (dataset_name: default) via [`lmqg`](https://github.com/asahi417/lm-question-generation).
54
+ This model is fine-tuned on the end-to-end question and answer generation.
55
+
56
+ Please cite our paper if you use the model ([https://arxiv.org/abs/2210.03992](https://arxiv.org/abs/2210.03992)).
57
+
58
+ ```
59
+
60
+ @inproceedings{ushio-etal-2022-generative,
61
+ title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration",
62
+ author = "Ushio, Asahi and
63
+ Alva-Manchego, Fernando and
64
+ Camacho-Collados, Jose",
65
+ booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
66
+ month = dec,
67
+ year = "2022",
68
+ address = "Abu Dhabi, U.A.E.",
69
+ publisher = "Association for Computational Linguistics",
70
+ }
71
+
72
+ ```
73
+
74
+ ### Overview
75
+ - **Language model:** [t5-base](https://huggingface.co/t5-base)
76
+ - **Language:** en
77
+ - **Training data:** [lmqg/qag_tweetqa](https://huggingface.co/datasets/lmqg/qag_tweetqa) (default)
78
+ - **Online Demo:** [https://autoqg.net/](https://autoqg.net/)
79
+ - **Repository:** [https://github.com/asahi417/lm-question-generation](https://github.com/asahi417/lm-question-generation)
80
+ - **Paper:** [https://arxiv.org/abs/2210.03992](https://arxiv.org/abs/2210.03992)
81
+
82
+ ### Usage
83
+ - With [`lmqg`](https://github.com/asahi417/lm-question-generation#lmqg-language-model-for-question-generation-)
84
+ ```python
85
+
86
+ from lmqg import TransformersQG
87
+ # initialize model
88
+ model = TransformersQG(language='en', model='lmqg/t5-base-tweetqa-qag')
89
+ # model prediction
90
+ question = model.generate_q(list_context=["William Turner was an English painter who specialised in watercolour landscapes"], list_answer=["William Turner"])
91
+
92
+ ```
93
+
94
+ - With `transformers`
95
+ ```python
96
+
97
+ from transformers import pipeline
98
+ # initialize model
99
+ pipe = pipeline("text2text-generation", 'lmqg/t5-base-tweetqa-qag')
100
+ # question generation
101
+ question = pipe('generate question and answer: Beyonce further expanded her acting career, starring as blues singer Etta James in the 2008 musical biopic, Cadillac Records.')
102
+
103
+ ```
104
+
105
+ ## Evaluation Metrics
106
+
107
+
108
+ ### Metrics
109
+
110
+ | Dataset | Type | BLEU4 | ROUGE-L | METEOR | BERTScore | MoverScore | Link |
111
+ |:--------|:-----|------:|--------:|-------:|----------:|-----------:|-----:|
112
+ | [lmqg/qag_tweetqa](https://huggingface.co/datasets/lmqg/qag_tweetqa) | default | 0.0 | 0.005 | 0.003 | 0.039 | 0.456 | [link](https://huggingface.co/lmqg/t5-base-tweetqa-qag/raw/main/eval/metric.first.sentence.paragraph.questions_answers.lmqg_qag_tweetqa.default.json) |
113
+
114
+
115
+
116
+
117
+ ## Training hyperparameters
118
+
119
+ The following hyperparameters were used during fine-tuning:
120
+ - dataset_path: lmqg/qag_tweetqa
121
+ - dataset_name: default
122
+ - input_types: ['paragraph']
123
+ - output_types: ['questions_answers']
124
+ - prefix_types: ['qag']
125
+ - model: t5-base
126
+ - max_length: 256
127
+ - max_length_output: 128
128
+ - epoch: 12
129
+ - batch: 64
130
+ - lr: 0.0001
131
+ - fp16: False
132
+ - random_seed: 1
133
+ - gradient_accumulation_steps: 2
134
+ - label_smoothing: 0.15
135
+
136
+ The full configuration can be found at [fine-tuning config file](https://huggingface.co/lmqg/t5-base-tweetqa-qag/raw/main/trainer_config.json).
137
+
138
+ ## Citation
139
+ ```
140
+
141
+ @inproceedings{ushio-etal-2022-generative,
142
+ title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration",
143
+ author = "Ushio, Asahi and
144
+ Alva-Manchego, Fernando and
145
+ Camacho-Collados, Jose",
146
+ booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
147
+ month = dec,
148
+ year = "2022",
149
+ address = "Abu Dhabi, U.A.E.",
150
+ publisher = "Association for Computational Linguistics",
151
+ }
152
+
153
+ ```
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "lmqg_output/t5_base_tweetqa/best_model",
3
  "add_prefix": true,
4
  "architectures": [
5
  "T5ForConditionalGeneration"
 
1
  {
2
+ "_name_or_path": "lmqg_output/t5_base_tweetqa/model_btmsre/epoch_10",
3
  "add_prefix": true,
4
  "architectures": [
5
  "T5ForConditionalGeneration"
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5cc96390d85606c2778544074ac4c2bc5ad8164e7b7f75a1ccd4eba87cda091f
3
- size 891614207
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:679f0cdf0a261d630169d9a11ac0d27ca6f1a10db016d5d066e0c4a539d2c101
3
+ size 891617855
tokenizer_config.json CHANGED
@@ -104,7 +104,7 @@
104
  "eos_token": "</s>",
105
  "extra_ids": 100,
106
  "model_max_length": 512,
107
- "name_or_path": "lmqg_output/t5_base_tweetqa/best_model",
108
  "pad_token": "<pad>",
109
  "special_tokens_map_file": null,
110
  "tokenizer_class": "T5Tokenizer",
 
104
  "eos_token": "</s>",
105
  "extra_ids": 100,
106
  "model_max_length": 512,
107
+ "name_or_path": "lmqg_output/t5_base_tweetqa/model_btmsre/epoch_10",
108
  "pad_token": "<pad>",
109
  "special_tokens_map_file": null,
110
  "tokenizer_class": "T5Tokenizer",