llangnickel/long-covid-classification

long-covid-classification

We fine-tuned bert-base-cased using a manually curated dataset to train a Sequence Classification model able to distinguish between long COVID and non-long COVID-related documents.

Used hyper parameters

Parameter	Value
Learning rate	3e-5
Batch size	16
Number of epochs	4
Sequence Length	512

Metrics

Precision [%]	Recall [%]	F1-score [%]
91.18	91.18	91.18

How to load the model

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True)
label_dict = {0: "nonLongCOVID", 1: "longCOVID"}
model = AutoModelForSequenceClassification.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True, num_labels=len(label_dict))

Citation

@article{10.1093/database/baac048,
author = {Langnickel, Lisa and Darms, Johannes and Heldt, Katharina and Ducks, Denise and Fluck, Juliane},
title = "{Continuous development of the semantic search engine preVIEW: from COVID-19 to long COVID}",
journal = {Database},
volume = {2022},
year = {2022},
month = {07},
issn = {1758-0463},
doi = {10.1093/database/baac048},
url = {https://doi.org/10.1093/database/baac048},
note = {baac048},
eprint = {https://academic.oup.com/database/article-pdf/doi/10.1093/database/baac048/44371817/baac048.pdf},
}