yonghuiwu commited on
Commit
1bc8648
1 Parent(s): b434d0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
4
 
5
  <h2>GatorTron-Medium overview </h2>
6
 
7
- Developed by a joint effort between the University of Florida and NVIDIA, GatorTron-Medium is a large language model of 3.9 billiom parameters, pre-trained using a BERT architecure implemented in the Megatron package (https://github.com/NVIDIA/Megatron-LM).
8
 
9
  GatorTron-Medium is pre-trained using a dataset consisting of:
10
 
@@ -13,6 +13,25 @@ GatorTron-Medium is pre-trained using a dataset consisting of:
13
  - 2.5B words from WikiText,
14
  - 0.5B words of de-identified clinical notes from MIMIC-III
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  <h2>De-identification</h2>
17
 
18
  We applied a de-identification system to remove protected health information (PHI) from clinical text. We adopted the safe-harbor method to identify 18 PHI categories defined in the Health Insurance Portability and Accountability Act (HIPAA) and replaced them with dummy strings (e.g., replace people’s names into [\*\*NAME\*\*]).
@@ -25,6 +44,20 @@ Yang X, Lyu T, Li Q, Lee C-Y, Bian J, Hogan WR, Wu Y†. A study of deep learnin
25
 
26
  Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, Compas C, Martin C, Costa AB, Flores MG, Zhang Y, Magoc T, Harle CA, Lipori G, Mitchell DA, Hogan WR, Shenkman EA, Bian J, Wu Y†. A large language model for electronic health records. Npj Digit Med. Nature Publishing Group; . 2022 Dec 26;5(1):1–9. https://www.nature.com/articles/s41746-022-00742-2
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  <h2>Contact</h2>
29
 
30
  - Yonghui Wu: yonghui.wu 'at' ufl.edu
 
4
 
5
  <h2>GatorTron-Medium overview </h2>
6
 
7
+ Developed by a joint effort between the University of Florida and NVIDIA, GatorTron-Medium is a clinical language model of 3.9 billion parameters, pre-trained using a BERT architecure implemented in the Megatron package (https://github.com/NVIDIA/Megatron-LM).
8
 
9
  GatorTron-Medium is pre-trained using a dataset consisting of:
10
 
 
13
  - 2.5B words from WikiText,
14
  - 0.5B words of de-identified clinical notes from MIMIC-III
15
 
16
+ The Github for GatorTron is at : https://github.com/uf-hobi-informatics-lab/GatorTron
17
+
18
+
19
+ <h2>Model variations</h2>
20
+
21
+ Model | Parameter
22
+ --- | ---
23
+ [gatortron-base](https://huggingface.co/UFNLP/gatortron-base)| 345 million
24
+ [gatortronS](https://huggingface.co/UFNLP/gatortronS) | 345 million
25
+ [gatortron-medium (this model)](https://huggingface.co/UFNLP/gatortron-medium) | 3.9 billion
26
+ gatortron-large | 8.9 billion
27
+
28
+ <h2>How to use</h2>
29
+
30
+
31
+ - An NLP pacakge using GatorTron for clinical concept extraction (Named Entity Recognition): https://github.com/uf-hobi-informatics-lab/ClinicalTransformerNER
32
+ - An NLP pacakge using GatorTron for Relation Extraction: https://github.com/uf-hobi-informatics-lab/ClinicalTransformerRelationExtraction
33
+ - An NLP pacakge using GatorTron for extraction of social determinants of health (SDoH) from clinical narratives: https://github.com/uf-hobi-informatics-lab/SDoH_SODA
34
+
35
  <h2>De-identification</h2>
36
 
37
  We applied a de-identification system to remove protected health information (PHI) from clinical text. We adopted the safe-harbor method to identify 18 PHI categories defined in the Health Insurance Portability and Accountability Act (HIPAA) and replaced them with dummy strings (e.g., replace people’s names into [\*\*NAME\*\*]).
 
44
 
45
  Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, Compas C, Martin C, Costa AB, Flores MG, Zhang Y, Magoc T, Harle CA, Lipori G, Mitchell DA, Hogan WR, Shenkman EA, Bian J, Wu Y†. A large language model for electronic health records. Npj Digit Med. Nature Publishing Group; . 2022 Dec 26;5(1):1–9. https://www.nature.com/articles/s41746-022-00742-2
46
 
47
+ - BibTeX entry
48
+ ```
49
+ @article{yang2022large,
50
+ title={A large language model for electronic health records},
51
+ author={Yang, Xi and Chen, Aokun and PourNejatian, Nima and Shin, Hoo Chang and Smith, Kaleb E and Parisien, Christopher and Compas, Colin and Martin, Cheryl and Costa, Anthony B and Flores, Mona G and Zhang, Ying and Magoc, Tanja and Harle, Christopher A and Lipori, Gloria and Mitchell, Duane A and Hogan, William R and Shenkman, Elizabeth A and Bian, Jiang and Wu, Yonghui },
52
+ journal={npj Digital Medicine},
53
+ volume={5},
54
+ number={1},
55
+ pages={194},
56
+ year={2022},
57
+ publisher={Nature Publishing Group UK London}
58
+ }
59
+ ```
60
+
61
  <h2>Contact</h2>
62
 
63
  - Yonghui Wu: yonghui.wu 'at' ufl.edu