UCSD-VA-health
commited on
Commit
•
f2bc797
1
Parent(s):
d2d99d6
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,47 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
## RadBERT-RoBERTa-4m
|
5 |
+
|
6 |
+
This is a base model of Radiology-BERT models from UC San Diego and VA healthcare system. It is initialized from BERT-base-uncased and further trained with 2 million radiology reports deidentified from US VA hospital. The model achieves stronger medical language understanding performance than previous medical domain models such as BioBERT, Clinical-BERT, BLUE-BERT and BioMed-RoBERTa.
|
7 |
+
|
8 |
+
Performances are evaluated on three tasks:
|
9 |
+
(a) abnormal sentence classification: sentence classification in radiology reports as reporting abnormal or normal findings;
|
10 |
+
(b) report coding: Assign a diagnostic code to a given radiology report for five different coding systems;
|
11 |
+
(c) report summarization: given the findings section of a radiology report, extractively select key sentences that summarized the findings.
|
12 |
+
|
13 |
+
It also shows superior performance on other radiology NLP tasks which are not reported in the paper.
|
14 |
+
|
15 |
+
For details, check out the paper here:
|
16 |
+
[RadBERT: Adapting transformer-based language models to radiology](https://pubs.rsna.org/doi/abs/10.1148/ryai.210258)
|
17 |
+
|
18 |
+
### How to use
|
19 |
+
|
20 |
+
Here is an example of how to use this model to extract the features of a given text in PyTorch:
|
21 |
+
|
22 |
+
```python
|
23 |
+
from transformers import AutoConfig, AutoTokenizer, AutoModel
|
24 |
+
config = AutoConfig.from_pretrained('zzxslp/RadBERT-RoBERTa-4m')
|
25 |
+
tokenizer = AutoTokenizer.from_pretrained('zzxslp/RadBERT-RoBERTa-4m')
|
26 |
+
model = AutoModel.from_pretrained('zzxslp/RadBERT-RoBERTa-4m', config=config)
|
27 |
+
text = "Replace me by any medical text you'd like."
|
28 |
+
encoded_input = tokenizer(text, return_tensors='pt')
|
29 |
+
output = model(**encoded_input)
|
30 |
+
```
|
31 |
+
|
32 |
+
### BibTeX entry and citation info
|
33 |
+
|
34 |
+
If you use the model, please cite our paper:
|
35 |
+
|
36 |
+
```bibtex
|
37 |
+
@article{yan2022radbert,
|
38 |
+
title={RadBERT: Adapting transformer-based language models to radiology},
|
39 |
+
author={Yan, An and McAuley, Julian and Lu, Xing and Du, Jiang and Chang, Eric Y and Gentili, Amilcare and Hsu, Chun-Nan},
|
40 |
+
journal={Radiology: Artificial Intelligence},
|
41 |
+
volume={4},
|
42 |
+
number={4},
|
43 |
+
pages={e210258},
|
44 |
+
year={2022},
|
45 |
+
publisher={Radiological Society of North America}
|
46 |
+
}
|
47 |
+
```
|