long-covid-classification

We fine-tuned bert-base-cased using a manually curated dataset to train a Sequence Classification model able to distinguish between long COVID and non-long COVID-related documents.

Used hyper parameters

Parameter Value
Learning rate 3e-5
Batch size 16
Number of epochs 4
Sequence Length 512

Metrics

Precision [%] Recall [%] F1-score [%]
91.18 91.18 91.18

How to load the model

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True)
label_dict = {0: "nonLongCOVID", 1: "longCOVID"}
model = AutoModelForSequenceClassification.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True, num_labels=len(label_dict))

Citation

@article{10.1093/database/baac048,
author = {Langnickel, Lisa and Darms, Johannes and Heldt, Katharina and Ducks, Denise and Fluck, Juliane},
title = "{Continuous development of the semantic search engine preVIEW: from COVID-19 to long COVID}",
journal = {Database},
volume = {2022},
year = {2022},
month = {07},
issn = {1758-0463},
doi = {10.1093/database/baac048},
url = {https://doi.org/10.1093/database/baac048},
note = {baac048},
eprint = {https://academic.oup.com/database/article-pdf/doi/10.1093/database/baac048/44371817/baac048.pdf},
}

Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.