File size: 1,737 Bytes
7283f75
d6ffac3
7283f75
 
d6ffac3
7283f75
d6ffac3
 
 
 
7283f75
 
 
d23478f
7283f75
 
d23478f
7283f75
 
d23478f
5364c7e
7283f75
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
language: en
thumbnail: "url to a thumbnail used in social sharing"
tags:
- text-classifcation
license: cc
datasets: MIMIC-III 
widget:
- text: "This report discusses the diagnosis of lung cancer in a female patient who has never smoked."

---

## Model information:
This model is the [scibert_scivocab_uncased](https://huggingface.co/allenai/scibert_scivocab_uncased) model that has been finetuned using radiology report texts from the MIMIC-III database. The task performed was text classification in order to benchmark this model with a selection of other variants of BERT for the classifcation of MIMIC-III radiology report texts into two classes. Labels of [0,1] were assigned to radiology reports in MIMIC-III that were linked to an ICD9 diagnosis code for lung cancer = 1 and a random sample of reports which were not linked to any type of cancer diagnosis code at all = 0. 

## Intended uses:
This model is intended to be used to classify texts to identify the presence of lung cancer.  The model will predict lables of [0,1].

## Limitations:
Note that the dataset and model may not be fully represetative or suitable for all needs it is recommended that the paper for the dataset and the base model card should be reviewed before use - 
- [MIMIC-III](https://www.nature.com/articles/sdata201635.pdf)
- [scibert_scivocab_uncased](https://huggingface.co/allenai/scibert_scivocab_uncased)


## How to use:
Load the model from the library using the following checkpoints:
```python
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("sarahmiller137/scibert-scivocab-uncased-ft-m3-lc")
model = AutoModel.from_pretrained("sarahmiller137/scibert-scivocab-uncased-ft-m3-lc")
```