Upload 4 files
Browse files- README.md +67 -0
- config.json +36 -0
- pytorch_model.bin +3 -0
- vocab.txt +0 -0
README.md
ADDED
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: "en"
|
3 |
+
tags:
|
4 |
+
- bert
|
5 |
+
- medical
|
6 |
+
- clinical
|
7 |
+
- assertion
|
8 |
+
- negation
|
9 |
+
- text-classification
|
10 |
+
widget:
|
11 |
+
- text: "Patient denies [entity] SOB [entity]."
|
12 |
+
|
13 |
+
---
|
14 |
+
|
15 |
+
# Clinical Assertion / Negation Classification BERT
|
16 |
+
|
17 |
+
## Model description
|
18 |
+
|
19 |
+
The Clinical Assertion and Negation Classification BERT is introduced in the paper [Assertion Detection in Clinical Notes: Medical Language Models to the Rescue?
|
20 |
+
](https://aclanthology.org/2021.nlpmc-1.5/). The model helps structure information in clinical patient letters by classifying medical conditions mentioned in the letter into PRESENT, ABSENT and POSSIBLE.
|
21 |
+
|
22 |
+
The model is based on the [ClinicalBERT - Bio + Discharge Summary BERT Model](https://huggingface.co/emilyalsentzer/Bio_Discharge_Summary_BERT) by Alsentzer et al. and fine-tuned on assertion data from the [2010 i2b2 challenge](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3168320/).
|
23 |
+
|
24 |
+
|
25 |
+
#### How to use the model
|
26 |
+
|
27 |
+
You can load the model via the transformers library:
|
28 |
+
```
|
29 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
|
30 |
+
tokenizer = AutoTokenizer.from_pretrained("bvanaken/clinical-assertion-negation-bert")
|
31 |
+
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/clinical-assertion-negation-bert")
|
32 |
+
|
33 |
+
```
|
34 |
+
|
35 |
+
The model expects input in the form of spans/sentences with one marked entity to classify as `PRESENT(0)`, `ABSENT(1)` or `POSSIBLE(2)`. The entity in question is identified with the special token `[entity]` surrounding it.
|
36 |
+
|
37 |
+
Example input and inference:
|
38 |
+
```
|
39 |
+
input = "The patient recovered during the night and now denies any [entity] shortness of breath [entity]."
|
40 |
+
|
41 |
+
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)
|
42 |
+
|
43 |
+
classification = classifier(input)
|
44 |
+
# [{'label': 'ABSENT', 'score': 0.9842607378959656}]
|
45 |
+
```
|
46 |
+
|
47 |
+
### Cite
|
48 |
+
|
49 |
+
When working with the model, please cite our paper as follows:
|
50 |
+
|
51 |
+
```bibtex
|
52 |
+
@inproceedings{van-aken-2021-assertion,
|
53 |
+
title = "Assertion Detection in Clinical Notes: Medical Language Models to the Rescue?",
|
54 |
+
author = "van Aken, Betty and
|
55 |
+
Trajanovska, Ivana and
|
56 |
+
Siu, Amy and
|
57 |
+
Mayrdorfer, Manuel and
|
58 |
+
Budde, Klemens and
|
59 |
+
Loeser, Alexander",
|
60 |
+
booktitle = "Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations",
|
61 |
+
year = "2021",
|
62 |
+
address = "Online",
|
63 |
+
publisher = "Association for Computational Linguistics",
|
64 |
+
url = "https://aclanthology.org/2021.nlpmc-1.5",
|
65 |
+
doi = "10.18653/v1/2021.nlpmc-1.5"
|
66 |
+
}
|
67 |
+
```
|
config.json
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"architectures": [
|
3 |
+
"BertForSequenceClassification"
|
4 |
+
],
|
5 |
+
"attention_probs_dropout_prob": 0.1,
|
6 |
+
"finetuning_task": "text_classification",
|
7 |
+
"gradient_checkpointing": false,
|
8 |
+
"hidden_act": "gelu",
|
9 |
+
"hidden_dropout_prob": 0.1,
|
10 |
+
"hidden_size": 768,
|
11 |
+
"id2label": {
|
12 |
+
"0": "PRESENT",
|
13 |
+
"1": "ABSENT",
|
14 |
+
"2": "POSSIBLE"
|
15 |
+
},
|
16 |
+
"initializer_range": 0.02,
|
17 |
+
"intermediate_size": 3072,
|
18 |
+
"label2id": {
|
19 |
+
"PRESENT": 0,
|
20 |
+
"ABSENT": 1,
|
21 |
+
"POSSIBLE": 2
|
22 |
+
},
|
23 |
+
"language": "english",
|
24 |
+
"layer_norm_eps": 1e-12,
|
25 |
+
"max_position_embeddings": 512,
|
26 |
+
"model_type": "bert",
|
27 |
+
"name": "Bert",
|
28 |
+
"num_attention_heads": 12,
|
29 |
+
"num_hidden_layers": 12,
|
30 |
+
"pad_token_id": 0,
|
31 |
+
"position_embedding_type": "absolute",
|
32 |
+
"transformers_version": "4.6.1",
|
33 |
+
"type_vocab_size": 2,
|
34 |
+
"use_cache": true,
|
35 |
+
"vocab_size": 28997
|
36 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a5eb2077bb4192ba2ef24496c24b6c15fd2c7cc6d332fdb07170f4d602658221
|
3 |
+
size 433339913
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|