garyw commited on
Commit
0467703
1 Parent(s): a0f7a2a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md CHANGED
@@ -1,3 +1,45 @@
1
  ---
2
  license: gpl-3.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: gpl-3.0
3
  ---
4
+
5
+ Pre-trained word embeddings using the text of published clinical case reports. These embeddings use 300 dimensions and were trained using the GloVe algorithm on published clinical case reports found in the [PMC Open Access Subset](https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/). See the paper here: https://pubmed.ncbi.nlm.nih.gov/34920127/
6
+
7
+ Citation:
8
+
9
+ ```
10
+ @article{flamholz2022word,
11
+ title={Word embeddings trained on published case reports are lightweight, effective for clinical tasks, and free of protected health information},
12
+ author={Flamholz, Zachary N and Crane-Droesch, Andrew and Ungar, Lyle H and Weissman, Gary E},
13
+ journal={Journal of Biomedical Informatics},
14
+ volume={125},
15
+ pages={103971},
16
+ year={2022},
17
+ publisher={Elsevier}
18
+ }
19
+ ```
20
+
21
+ ## Quick start
22
+
23
+ Word embeddings are compatible with the [`gensim` Python package](https://radimrehurek.com/gensim/) format.
24
+
25
+ First download the files from this archive. Then load the embeddings into Python.
26
+
27
+
28
+ ```python
29
+
30
+ from gensim.models import FastText, Word2Vec, KeyedVectors # KeyedVectors are used to load the GloVe models
31
+
32
+ # Load the model
33
+ model = KeyedVectors.load_word2vec_format('gl_300_cr.txt')
34
+
35
+ # Return 100-dimensional vector representations of each word
36
+ model.word_vec('diabetes')
37
+ model.word_vec('cardiac_arrest')
38
+ model.word_vec('lymphangioleiomyomatosis')
39
+
40
+ # Try out cosine similarity
41
+ model.similarity('copd', 'chronic_obstructive_pulmonary_disease')
42
+ model.similarity('myocardial_infarction', 'heart_attack')
43
+ model.similarity('lymphangioleiomyomatosis', 'lam')
44
+
45
+ ```