File size: 2,201 Bytes
a88ce0c 56ed0e6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
---
license: mit
---
# Model Card for Model longluu/Clinical-NER-NCBI-Disease-GatorTronS
The model is an NER LLM algorithm that can classify each word in a text into different clinical categories.
## Model Details
### Model Description
The base pretrained model is GatorTronS which was trained on billions of words in various clinical texts (https://huggingface.co/UFNLP/gatortronS).
Then using the NCBI Disease dataset (https://www.sciencedirect.com/science/article/pii/S1532046413001974?via%3Dihub),
I fine-tuned the model for NER task in which the model can classify each word in a text into one of the categories ['no disease', 'disease', 'disease-continue'].
### Model Sources [optional]
The github code associated with the model can be found here: https://github.com/longluu/LLM-NER-clinical-text.
## Training Details
### Training Data
This dataset contains the disease name and concept annotations of the NCBI disease corpus, a collection of 793 PubMed abstracts fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community. Details are here https://www.sciencedirect.com/science/article/pii/S1532046413001974?via%3Dihub.
The preprocessed data for LLM training can be found here https://huggingface.co/datasets/ncbi_disease.
#### Training Hyperparameters
The hyperparameters are --batch_size 24
--num_train_epochs 5
--learning_rate 5e-5
--weight_decay 0.01
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
The model was trained and validated on train and validation sets. Then it was tested on a separate test set.
Note that some concepts in the test set were not available in the train and validatin sets.
#### Metrics
Here we use several metrics for classification tasks including macro-average F1, precision, recall and Matthew correlation.
### Results
{'f1': 0.9230959441861525,
'precision': 0.8998375309216448,
'recall': 0.948772382840148,
'matthews_correlation': 0.8978492834665438}
## Model Card Contact
Feel free to reach out to me at thelong20.4@gmail.com if you have any question or suggestion. |