metadata
license: apache-2.0
language:
- en
tags:
- medical
Dataset: https://www.kaggle.com/datasets/timmayer/covid-news-articles-2020-2022
Comprehensive guide can be found here: https://medium.com/@shankar.arunp/easily-build-your-own-gpt-from-scratch-using-aws-51811b6355d3
The model is GPT2 further pre-trained on the news articles to incorporate COVID-19 related context to the model.
Similar article on how to further pre-train a BERT base model from scratch using the articles can be found here: https://medium.com/@shankar.arunp/training-bert-from-scratch-on-your-custom-domain-data-a-step-by-step-guide-with-amazon-25fcbee4316a