shengz commited on
Commit
d3d9d71
1 Parent(s): e3ef0b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -1
README.md CHANGED
@@ -17,6 +17,44 @@ See [Zhang et al., 2021](https://arxiv.org/abs/2112.07887) for the details.
17
 
18
  Note that some prior systems like [BioSyn](https://aclanthology.org/2020.acl-main.335.pdf), [SapBERT](https://aclanthology.org/2021.naacl-main.334.pdf), and their follow-up work (e.g., [Lai et al., 2021](https://aclanthology.org/2021.findings-emnlp.140.pdf)) claimed to do entity linking, but their systems completely ignore the context of an entity mention, and can only predict a surface form in the entity dictionary (See Figure 1 in [BioSyn](https://aclanthology.org/2020.acl-main.335.pdf)), _**not the canonical entity ID (e.g., CUI in UMLS)**_. Therefore, they can't disambiguate ambiguous mentions. For instance, given the entity mention "_ER_" in the sentence "*ER crowding has become a wide-spread problem*", their systems ignore the sentence context, and simply predict the closest surface form, which is just "ER". Multiple entities share this surface form as a potential name or alias, such as *Emergency Room (C0562508)*, *Estrogen Receptor Gene (C1414461)*, and *Endoplasmic Reticulum(C0014239)*. Without using the context information, their systems can't resolve such ambiguity and pinpoint the correct entity *Emergency Room (C0562508)*. More problematically, their evaluation would deem such an ambiguous prediction as correct. Consequently, the reported results in their papers do not reflect true performance on entity linking.
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ## Citation
21
 
22
  If you find KRISSBERT useful in your research, please cite the following paper:
@@ -30,4 +68,5 @@ If you find KRISSBERT useful in your research, please cite the following paper:
30
  eprinttype = {arXiv},
31
  eprint = {2112.07887},
32
  }
33
- ```
 
17
 
18
  Note that some prior systems like [BioSyn](https://aclanthology.org/2020.acl-main.335.pdf), [SapBERT](https://aclanthology.org/2021.naacl-main.334.pdf), and their follow-up work (e.g., [Lai et al., 2021](https://aclanthology.org/2021.findings-emnlp.140.pdf)) claimed to do entity linking, but their systems completely ignore the context of an entity mention, and can only predict a surface form in the entity dictionary (See Figure 1 in [BioSyn](https://aclanthology.org/2020.acl-main.335.pdf)), _**not the canonical entity ID (e.g., CUI in UMLS)**_. Therefore, they can't disambiguate ambiguous mentions. For instance, given the entity mention "_ER_" in the sentence "*ER crowding has become a wide-spread problem*", their systems ignore the sentence context, and simply predict the closest surface form, which is just "ER". Multiple entities share this surface form as a potential name or alias, such as *Emergency Room (C0562508)*, *Estrogen Receptor Gene (C1414461)*, and *Endoplasmic Reticulum(C0014239)*. Without using the context information, their systems can't resolve such ambiguity and pinpoint the correct entity *Emergency Room (C0562508)*. More problematically, their evaluation would deem such an ambiguous prediction as correct. Consequently, the reported results in their papers do not reflect true performance on entity linking.
19
 
20
+
21
+ ## Usage of KRISSBERT for Entity Linking
22
+
23
+ Here, we use the [MedMentions](https://github.com/chanzuckerberg/MedMentions) data to show you how to 1) **generate prototype embeddings**, and 2) **run entity linking**.
24
+
25
+ (We are currently unable to release the self-supervised mention examples, because they require the UMLS and PubMed licenses.)
26
+
27
+
28
+ #### 1. Create conda environment and install requirements
29
+ ```bash
30
+ conda create -n kriss -y python=3.8 && conda activate kriss
31
+ pip install -r requirements.txt
32
+ ```
33
+
34
+ #### 2. Switch the root dir to [usage](https://huggingface.co/microsoft/BiomedNLP-KRISSBERT-PubMed-UMLS-EL/tree/main/usage)
35
+ ```bash
36
+ cd usage
37
+ ```
38
+
39
+ #### 3. Download the MedMentions dataset
40
+
41
+ ```bash
42
+ git clone https://github.com/chanzuckerberg/MedMentions.git
43
+ ```
44
+
45
+ #### 4. Generate prototype embeddings
46
+ ```bash
47
+ python generate_prototypes.py
48
+ ```
49
+
50
+ #### 5. Run entity linking
51
+ ```bash
52
+ python run_entity_linking.py
53
+ ```
54
+
55
+ This will give you about `58.3%` top-1 accuracy.
56
+
57
+
58
  ## Citation
59
 
60
  If you find KRISSBERT useful in your research, please cite the following paper:
68
  eprinttype = {arXiv},
69
  eprint = {2112.07887},
70
  }
71
+ ```
72
+ [https://arxiv.org/pdf/2112.07887.pdf](https://arxiv.org/pdf/2112.07887.pdf)