Edit model card

Model Card of lmqg/bart-large-squad-qg

This model is fine-tuned version of facebook/bart-large for question generation task on the lmqg/qg_squad (dataset_name: default) via lmqg.

Overview

Usage

from lmqg import TransformersQG

# initialize model
model = TransformersQG(language="en", model="lmqg/bart-large-squad-qg")

# model prediction
questions = model.generate_q(list_context="William Turner was an English painter who specialised in watercolour landscapes", list_answer="William Turner")
  • With transformers
from transformers import pipeline

pipe = pipeline("text2text-generation", "lmqg/bart-large-squad-qg")
output = pipe("<hl> Beyonce <hl> further expanded her acting career, starring as blues singer Etta James in the 2008 musical biopic, Cadillac Records.")

Evaluation

Score Type Dataset
BERTScore 91 default lmqg/qg_squad
Bleu_1 58.79 default lmqg/qg_squad
Bleu_2 42.79 default lmqg/qg_squad
Bleu_3 33.11 default lmqg/qg_squad
Bleu_4 26.17 default lmqg/qg_squad
METEOR 27.07 default lmqg/qg_squad
MoverScore 64.99 default lmqg/qg_squad
ROUGE_L 53.85 default lmqg/qg_squad
  • Metric (Question & Answer Generation, Reference Answer): Each question is generated from the gold answer. raw metric file
Score Type Dataset
QAAlignedF1Score (BERTScore) 95.54 default lmqg/qg_squad
QAAlignedF1Score (MoverScore) 70.82 default lmqg/qg_squad
QAAlignedPrecision (BERTScore) 95.59 default lmqg/qg_squad
QAAlignedPrecision (MoverScore) 71.13 default lmqg/qg_squad
QAAlignedRecall (BERTScore) 95.49 default lmqg/qg_squad
QAAlignedRecall (MoverScore) 70.54 default lmqg/qg_squad
Score Type Dataset
QAAlignedF1Score (BERTScore) 93.23 default lmqg/qg_squad
QAAlignedF1Score (MoverScore) 64.76 default lmqg/qg_squad
QAAlignedPrecision (BERTScore) 93.13 default lmqg/qg_squad
QAAlignedPrecision (MoverScore) 64.98 default lmqg/qg_squad
QAAlignedRecall (BERTScore) 93.35 default lmqg/qg_squad
QAAlignedRecall (MoverScore) 64.63 default lmqg/qg_squad
  • Metrics (Question Generation, Out-of-Domain)
Dataset Type BERTScore Bleu_4 METEOR MoverScore ROUGE_L Link
lmqg/qg_squadshifts amazon 90.93 6.53 22.3 60.87 25.03 link
lmqg/qg_squadshifts new_wiki 93.23 11.12 27.32 66.23 29.68 link
lmqg/qg_squadshifts nyt 92.49 8.12 25.25 64.06 25.29 link
lmqg/qg_squadshifts reddit 90.95 5.95 21.5 60.59 22.37 link
lmqg/qg_subjqa books 88.07 0.63 11.58 55.56 12.37 link
lmqg/qg_subjqa electronics 87.83 0.87 15.35 56.35 16.02 link
lmqg/qg_subjqa grocery 87.79 0.53 15.13 57.02 12.34 link
lmqg/qg_subjqa movies 87.49 0.0 11.86 55.29 12.51 link
lmqg/qg_subjqa restaurants 87.98 0.0 12.42 55.43 13.08 link
lmqg/qg_subjqa tripadvisor 88.91 0.0 13.72 56.05 14.03 link

Training hyperparameters

The following hyperparameters were used during fine-tuning:

  • dataset_path: lmqg/qg_squad
  • dataset_name: default
  • input_types: ['paragraph_answer']
  • output_types: ['question']
  • prefix_types: None
  • model: facebook/bart-large
  • max_length: 512
  • max_length_output: 32
  • epoch: 4
  • batch: 32
  • lr: 5e-05
  • fp16: False
  • random_seed: 1
  • gradient_accumulation_steps: 4
  • label_smoothing: 0.15

The full configuration can be found at fine-tuning config file.

Citation

@inproceedings{ushio-etal-2022-generative,
    title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration",
    author = "Ushio, Asahi  and
        Alva-Manchego, Fernando  and
        Camacho-Collados, Jose",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, U.A.E.",
    publisher = "Association for Computational Linguistics",
}
Downloads last month
25

Dataset used to train lmqg/bart-large-squad-qg

Evaluation results

  • BLEU4 (Question Generation) on lmqg/qg_squad
    self-reported
    26.170
  • ROUGE-L (Question Generation) on lmqg/qg_squad
    self-reported
    53.850
  • METEOR (Question Generation) on lmqg/qg_squad
    self-reported
    27.070
  • BERTScore (Question Generation) on lmqg/qg_squad
    self-reported
    91.000
  • MoverScore (Question Generation) on lmqg/qg_squad
    self-reported
    64.990
  • QAAlignedF1Score-BERTScore (Question & Answer Generation (with Gold Answer)) [Gold Answer] on lmqg/qg_squad
    self-reported
    95.540
  • QAAlignedRecall-BERTScore (Question & Answer Generation (with Gold Answer)) [Gold Answer] on lmqg/qg_squad
    self-reported
    95.490
  • QAAlignedPrecision-BERTScore (Question & Answer Generation (with Gold Answer)) [Gold Answer] on lmqg/qg_squad
    self-reported
    95.590
  • QAAlignedF1Score-MoverScore (Question & Answer Generation (with Gold Answer)) [Gold Answer] on lmqg/qg_squad
    self-reported
    70.820
  • QAAlignedRecall-MoverScore (Question & Answer Generation (with Gold Answer)) [Gold Answer] on lmqg/qg_squad
    self-reported
    70.540
  • QAAlignedPrecision-MoverScore (Question & Answer Generation (with Gold Answer)) [Gold Answer] on lmqg/qg_squad
    self-reported
    71.130
  • QAAlignedF1Score-BERTScore (Question & Answer Generation) [Gold Answer] on lmqg/qg_squad
    self-reported
    93.230
  • QAAlignedRecall-BERTScore (Question & Answer Generation) [Gold Answer] on lmqg/qg_squad
    self-reported
    93.350
  • QAAlignedPrecision-BERTScore (Question & Answer Generation) [Gold Answer] on lmqg/qg_squad
    self-reported
    93.130
  • QAAlignedF1Score-MoverScore (Question & Answer Generation) [Gold Answer] on lmqg/qg_squad
    self-reported
    64.760
  • QAAlignedRecall-MoverScore (Question & Answer Generation) [Gold Answer] on lmqg/qg_squad
    self-reported
    64.630
  • QAAlignedPrecision-MoverScore (Question & Answer Generation) [Gold Answer] on lmqg/qg_squad
    self-reported
    64.980
  • BLEU4 (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.065
  • ROUGE-L (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.250
  • METEOR (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.223
  • BERTScore (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.909
  • MoverScore (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.609
  • BLEU4 (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.111
  • ROUGE-L (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.297
  • METEOR (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.273
  • BERTScore (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.932
  • MoverScore (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.662
  • BLEU4 (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.081
  • ROUGE-L (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.253
  • METEOR (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.253
  • BERTScore (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.925
  • MoverScore (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.641
  • BLEU4 (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.060
  • ROUGE-L (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.224
  • METEOR (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.215
  • BERTScore (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.910
  • MoverScore (Question Generation) on lmqg/qg_squadshifts
    self-reported
    0.606
  • BLEU4 (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.006
  • ROUGE-L (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.124
  • METEOR (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.116
  • BERTScore (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.881
  • MoverScore (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.556
  • BLEU4 (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.009
  • ROUGE-L (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.160
  • METEOR (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.153
  • BERTScore (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.878
  • MoverScore (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.563
  • BLEU4 (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.005
  • ROUGE-L (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.123
  • METEOR (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.151
  • BERTScore (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.878
  • MoverScore (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.570
  • BLEU4 (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.000
  • ROUGE-L (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.125
  • METEOR (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.119
  • BERTScore (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.875
  • MoverScore (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.553
  • BLEU4 (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.000
  • ROUGE-L (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.131
  • METEOR (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.124
  • BERTScore (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.880
  • MoverScore (Question Generation) on lmqg/qg_subjqa
    self-reported
    0.554