T5 Question Generation and Question Answering

Model description

This model is a T5 Transformers model (airklizz/t5-base-multi-fr-wiki-news) that was fine-tuned in french on 3 different tasks

  • question generation

  • question answering

  • answer extraction

It obtains quite good results on FQuAD validation dataset.

Intended uses & limitations

This model functions for the 3 tasks mentionned earlier and was not tested on other tasks.

from transformers import T5ForConditionalGeneration, T5Tokenizer
model = T5ForConditionalGeneration.from_pretrained("JDBN/t5-base-fr-qg-fquad")
tokenizer = T5Tokenizer.from_pretrained("JDBN/t5-base-fr-qg-fquad")

Training data

The initial model used was https://huggingface.co/airKlizz/t5-base-multi-fr-wiki-news. This model was finetuned on a dataset composed of FQuAD and PIAF on the 3 tasks mentioned previously.

The data were preprocessed like this

  • question generation: "generate question: Barack Hussein Obama, né le 4 aout 1961, est un homme politique américain et avocat. Il a été élu en 2009 pour devenir le 44ème président des Etats-Unis d'Amérique."

  • question answering: "question: Quand Barack Hussein Obamaa-t-il été élu président des Etats-Unis d’Amérique? context: Barack Hussein Obama, né le 4 aout 1961, est un homme politique américain et avocat. Il a été élu en 2009 pour devenir le 44ème président des Etats-Unis d’Amérique."

  • answer extraction: "extract_answers: Barack Hussein Obama, né le 4 aout 1961, est un homme politique américain et avocat. Il a été élu en 2009 pour devenir le 44ème président des Etats-Unis d’Amérique ."

The preprocessing we used was implemented in https://github.com/patil-suraj/question_generation

Eval results

On FQuAD validation set

BLEU_1 BLEU_2 BLEU_3 BLEU_4 METEOR ROUGE_L CIDEr
0.290 0.203 0.149 0.111 0.197 0.284 1.038

Question Answering metrics

For these metrics, the performance of this question answering model (https://huggingface.co/illuin/camembert-base-fquad) on FQuAD original question and on T5 generated questions are compared.

Questions Exact Match F1 Score
Original FQuAD 54.015 77.466
Generated 45.765 67.306

BibTeX entry and citation info

@misc{githubPatil,
author = {Patil Suraj},
title = {question generation GitHub repository},
year = {2020},
howpublished={\url{https://github.com/patil-suraj/question_generation}}
}

@article{T5,
    title={Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
    author={Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
    year={2019},
    eprint={1910.10683},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

@misc{dhoffschmidt2020fquad,
      title={FQuAD: French Question Answering Dataset}, 
      author={Martin d'Hoffschmidt and Wacim Belblidia and Tom Brendlé and Quentin Heinrich and Maxime Vidal},
      year={2020},
      eprint={2002.06071},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
28
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train JDBN/t5-base-fr-qg-fquad