Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Covid19 Related Question Answering (Closed book question answering)

In 2020, COVID-19 which is caused by a coronavirus called SARS-CoV-2 took over the world. It touched the lives of many people and caused a lot of hardship for humanity. There are still many questions in regards to COVID-19 and it is often difficult to get the right answers. The aim of this project is to finetune models for closed book question answering. In closed-book QA, we feed the model a question without any context or access to external knowledge and train it to predict the answer. Since the model doesn't receive any context, the primary way it can learn to answer these questions is based on the "knowledge" it obtained during pre-training [1] [2].

The main goals of this project are:

  1. Train a model for question answering in regards to COVID-19
  2. Release the top performing models for further research and enhancement
  3. Release all of the preprocessing and postprocessing scripts and findings for future research.

TO DO LIST:

  • Team members met and the following was discussed:
    • Data preparation script is prepared that mixes CORD-19 and Pubmed.
    • Agreed to finalize the training scripts by 9pm PDT 7/9/2021.
    • Tokenizer is now trained.
  • Setup the pretraining script
  • Prepare the finetuning tasks inspired from T5 Trivia Colab

1. Model

We will be using T5 model.

2. Datasets

The following datasets would be used for finetuning the model. Note that the last dataset is optional and the model is evaluated only using Covid-QA.

For Intermediate Pre-Training:

  1. CORD-19

For Fine-Tuning :

  1. Covid-QA
  2. CDC-QA
  3. Optional - Trivia-QA

3. Training Scripts

We can make use of :

  1. For preprocessing and mixing datasets
  2. For T5 training

4. Additional Reading

Downloads last month
0
Unable to determine this model's library. Check the docs .