long-covid-classification
We fine-tuned bert-base-cased using a manually curated dataset to train a Sequence Classification model able to distinguish between long COVID and non-long COVID-related documents.
Used hyper parameters
Parameter | Value |
---|---|
Learning rate | 3e-5 |
Batch size | 16 |
Number of epochs | 4 |
Sequence Length | 512 |
Metrics
Precision [%] | Recall [%] | F1-score [%] |
---|---|---|
91.18 | 91.18 | 91.18 |
How to load the model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True)
label_dict = {0: "nonLongCOVID", 1: "longCOVID"}
model = AutoModelForSequenceClassification.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True, num_labels=len(label_dict))
Citation
@article{10.1093/database/baac048,
author = {Langnickel, Lisa and Darms, Johannes and Heldt, Katharina and Ducks, Denise and Fluck, Juliane},
title = "{Continuous development of the semantic search engine preVIEW: from COVID-19 to long COVID}",
journal = {Database},
volume = {2022},
year = {2022},
month = {07},
issn = {1758-0463},
doi = {10.1093/database/baac048},
url = {https://doi.org/10.1093/database/baac048},
note = {baac048},
eprint = {https://academic.oup.com/database/article-pdf/doi/10.1093/database/baac048/44371817/baac048.pdf},
}
- Downloads last month
- 10