File size: 2,495 Bytes
56c53bb
 
6fc7d66
 
 
 
 
 
1a59e52
 
6fc7d66
 
 
 
 
 
 
 
 
 
 
 
 
2eb159d
56c53bb
6fc7d66
 
 
 
7edec27
6fc7d66
 
1a59e52
7edec27
2eb159d
 
 
7edec27
2eb159d
 
 
 
 
 
7edec27
2eb159d
 
 
7edec27
1a59e52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6fc7d66
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: mit
datasets:
- bigbio/chemdner
- ncbi_disease
- jnlpba
- bigbio/n2c2_2018_track2
- bigbio/bc5cdr
widget:
- text: Drug<SEP>He was given aspirin and paracetamol.
language:
- en
metrics:
- precision
- recall
- f1
pipeline_tag: token-classification
tags:
- token-classification
- biology
- medical
- zero-shot
- few-shot
library_name: transformers
---
# Zero and few shot NER for biomedical texts

## Model description
Model takes as input two strings. String1 is NER label. String1 must be phrase for entity. String2 is short text where String1 is searched for semantically.
model outputs list of zeros and ones corresponding to the occurance of Named Entity and corresponing to the tokens(tokens given by transformer tokenizer) of the Sring2.

## Example of usage
```python
from transformers import AutoTokenizer
from transformers import BertForTokenClassification

modelname = 'ProdicusII/ZeroShotBioNER'  # modelpath
tokenizer = AutoTokenizer.from_pretrained(modelname)  ## loading the tokenizer of that model
string1 = 'Drug'
string2 = 'No recent antibiotics or other nephrotoxins, and no symptoms of UTI with benign UA.'
encodings = tokenizer(string1, string2, is_split_into_words=False,
                      padding=True, truncation=True, add_special_tokens=True, return_offsets_mapping=False,
                      max_length=512, return_tensors='pt')

model = BertForTokenClassification.from_pretrained(modelname, num_labels=2)
prediction_logits = model(**encodings)
print(prediction_logits)
```

## Available classes

The following datasets and entities were used for training and therefore they can be used as label in the first segment (as a first string). Note that multiword string have been merged.


* NCBI
  * Specific Disease 
  * Composite Mention 
  * Modifier 
  * Disease Class
* BIORED
  * Sequence Variant 
  * Gene Or Gene Product 
  * Disease Or Phenotypic Feature 
  * Chemical Entity 
  * Cell Line 
  * Organism Taxon 
* CDR Disease 
  * Chemical
* CHEMDNER
  * Chemical
  * Chemical Family
* JNLPBA
  * Protein
  * DNA 
  * Cell Type 
  * Cell Line 
  * RNA 
* n2c2
  * Drug
  * Frequency 
  * Strength
  * Dosage
  * Form
  * Reason
  * Route
  * ADE 
  * Duration

On top of this, one can use the model in zero-shot regime with other classes, and also fine-tune it with few examples of other classes. 



## Code availibility

Code used for training and testing the model is available at https://github.com/br-ai-ns-institute/Zero-ShotNER 

## Citation