garyw commited on
Commit
1b89a03
1 Parent(s): 030baee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md CHANGED
@@ -1,3 +1,45 @@
1
  ---
2
  license: gpl-3.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: gpl-3.0
3
  ---
4
+
5
+ Pre-trained word embeddings using the text of published scientific manuscripts. These embeddings use 300 dimensions and were trained using the fasttext algorithm on all available manuscripts found in the [PMC Open Access Subset](https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/). See the paper here: https://pubmed.ncbi.nlm.nih.gov/34920127/
6
+
7
+ Citation:
8
+
9
+ ```
10
+ @article{flamholz2022word,
11
+ title={Word embeddings trained on published case reports are lightweight, effective for clinical tasks, and free of protected health information},
12
+ author={Flamholz, Zachary N and Crane-Droesch, Andrew and Ungar, Lyle H and Weissman, Gary E},
13
+ journal={Journal of Biomedical Informatics},
14
+ volume={125},
15
+ pages={103971},
16
+ year={2022},
17
+ publisher={Elsevier}
18
+ }
19
+ ```
20
+
21
+ ## Quick start
22
+
23
+ Word embeddings are compatible with the [`gensim` Python package](https://radimrehurek.com/gensim/) format.
24
+
25
+ First download the files from this archive. Then load the embeddings into Python.
26
+
27
+
28
+ ```python
29
+
30
+ from gensim.models import FastText, Word2Vec, KeyedVectors # KeyedVectors are used to load the GloVe models
31
+
32
+ # Load the model
33
+ model = FastText.load('ft_oa_all_300d.bin')
34
+
35
+ # Return 100-dimensional vector representations of each word
36
+ model.wv.word_vec('diabetes')
37
+ model.wv.word_vec('cardiac_arrest')
38
+ model.wv.word_vec('lymphangioleiomyomatosis')
39
+
40
+ # Try out cosine similarity
41
+ model.wv.similarity('copd', 'chronic_obstructive_pulmonary_disease')
42
+ model.wv.similarity('myocardial_infarction', 'heart_attack')
43
+ model.wv.similarity('lymphangioleiomyomatosis', 'lam')
44
+
45
+ ```