whaleloops/keptlongformer

KEPTlongfomer is a medical knowledge enhanced version of Longformer that was further pre-trained using contrastive learning. The model achieves SOTA performance on auto ICD coding on MIMIC-III as of 11/12/2022. A sister model for better performance is available here.

Pre-training

We initialized this model from clinical longformer.

And then pretrained with Hierarchical Self-Alignment Pretrain (HSAP) using Knowledge Graph UMLS. This includes (a) Hierarchy, (b) Synonym, (c) Abbreviation. For more info, see section 3.3 in paper. The learning rate was 5e-5, weight decay was 0.01, adam epsilon was 1e-5.

Usage

See our github for how to use this with prompts on auto ICD coding.

With the following result:

Metric	Score
rec_micro	=0.5729403619819988
rec_macro	=0.11342156911120573
rec_at_8	=0.4094837705486378
rec_at_75	=0.8470734920535119
rec_at_50	=0.8005338782352
rec_at_5	=0.2891628170355805
rec_at_15	=0.5768778119750537
prec_micro	=0.6411968713105065
prec_macro	=0.12227610414493029
prec_at_8	=0.7760972716488731
prec_at_75	=0.197504942665085
prec_at_50	=0.2768090154211151
prec_at_5	=0.8483392645314354
prec_at_15	=0.6178529062870699
f1_micro	=0.6051499904242899
f1_macro	=0.11768251595637802
f1_at_8	=0.536107150495997
f1_at_75	=0.32032290907137506
f1_at_50	=0.411373195944102
f1_at_5	=0.43131028155283435
f1_at_15	=0.5966627077602488
auc_micro	=0.9651754312635265
auc_macro	=0.8566590059725866
acc_micro	=0.43384592341105344
acc_macro	=0.08639139221100567