yonghuiwu commited on
Commit
902359a
1 Parent(s): 711d814

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md CHANGED
@@ -1,3 +1,30 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ <h2>GatorTron-Base overview </h2>
5
+
6
+ Developed by a joint effort between the University of Florida and NVIDIA, GatorTron-Base is a large language model of 345 million parameters, pre-trained using a BERT architecure implemented in the Megatron package (https://github.com/NVIDIA/Megatron-LM).
7
+
8
+ GatorTron-Base is pre-trained using a dataset consisting of:
9
+
10
+ - 82B words of de-identified clinical notes from the University of Florida Health System,
11
+ - 6.1B words from PubMed CC0,
12
+ - 2.5B words from WikiText,
13
+ - 0.5B words of de-identified clinical notes from MIMIC-III
14
+
15
+ <h2>De-identification</h2>
16
+
17
+ We applied a de-identification system to remove protected health information (PHI) from clinical text. We adopted the safe-harbor method to identify 18 PHI categories defined in the Health Insurance Portability and Accountability Act (HIPAA) and replaced them with dummy strings (e.g., replace people’s names into [\*\*NAME\*\*]).
18
+
19
+ The de-identifiation system is described in:
20
+
21
+ Yang X, Lyu T, Li Q, Lee C-Y, Bian J, Hogan WR, Wu Y†. A study of deep learning methods for de-identification of clinical notes in cross-institute settings. BMC Med Inform Decis Mak. 2020 Dec 5;19(5):232. https://www.ncbi.nlm.nih.gov/pubmed/31801524.
22
+
23
+ <h2>Citation info</h2>
24
+
25
+ Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, Compas C, Martin C, Costa AB, Flores MG, Zhang Y, Magoc T, Harle CA, Lipori G, Mitchell DA, Hogan WR, Shenkman EA, Bian J, Wu Y†. A large language model for electronic health records. Npj Digit Med. Nature Publishing Group; . 2022 Dec 26;5(1):1–9. https://www.nature.com/articles/s41746-022-00742-2
26
+
27
+ <h2>Contact</h2>
28
+
29
+ - Yonghui Wu: yonghui.wu 'at' ufl.edu
30
+ - Cheng Peng: c.peng 'at' ufl.edu