ClinicalNoteBERT

Using openly available clinical notes, we pretrain ClinicalNoteBERT, a series of encoders of three model sizes (110M, 67M, and 14.5M) that consider note contexts and variations during pretraining. We adopt a range of downstream applications to evaluate ClinicalNoteBERT, including tasks in fine-tuning, unsupervised semantic textual similarity, retrieval-augmented generation of LLMs, and unimodal and multimodal clinical predictions, and compare with strong baselines. Our models achieve better results than the baseline models of similar or larger sizes on various tasks and datasets. We find that different choices made during pretraining can lead to varied improvements for the downstream tasks. Our small and tiny versions of ClinicalNoteBERT maintain over 96% and 91% of the best performance with less than 61% and 14% of the parameters, respectively.

Overall performance

	# Params	FT	STS	RAG	CP	Fusion
ClinicalNoteBERT-note-only	110M	80.0	78.9	14.0	63.8	66.5
ClinicalNoteBERT-note-ntp	110M	80.6	73.6	13.0	62.9	65.8
ClinicalNoteBERT-base	110M	80.1	79.8	12.3	64.0	66.7
ClinicalNoteBERT-small	67M	78.1	77.1	11.4	64.6	66.8
ClinicalNoteBERT-tiny	14.5M	74.1	75.7	8.9	62.4	65.5

FT: fine-tuning. STS: semantic textual similarity (ClinicalSTS). RAG: retrieval augmented generation (GPT2, Llama2). CP: clinical prediction. Fusion: multimodal fusion for clinical prediction.

When encoding text sequences for STS, RAG, and CP/Fusion, ClinicalNoteBERT models are adapted through extra SimCSE training in the unsupervised fashion using varied sequence lengths/types. Sequence-sentence, sequence-segment, and sequence-note are used for STS, RAG, and CP/Fusion, respectively, according to their corresponding lengths. More details can be found in the paper.

Citation

Under review