Biomedical NLP papers
Papers posted on @ArxivHealthcareNLP@sigmoid.social (Clinical, Healthcare & Biomedical NLP)
Paper • 2404.14779 • PublishedNote This study presents a comprehensive analysis and comparison of full-parameter vs parameter-efficient tuning, within the context of medical LLMs. We developed and refined a series of LLMs, based on the Llama-2 architecture, specifically designed to enhance medical knowledge retrieval, reasoning, and question-answering capabilities.
emrQA-msquad: A Medical Dataset Structured with the SQuAD V2.0 Framework, Enriched with emrQA Medical Information
Paper • 2404.12050 • PublishedNote In this work, we introduce emrQA-msquad, a medical dataset structured with the SQuAD V2.0 framework and enriched with emrQA medical information. It comprises 160k questions and 4k manually obtained answers, aimed at enhancing the accuracy of Medical QA systems. We also finetuned BERT-type models on the dataset.
Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain
Paper • 2404.07613 • PublishedNote In this paper, we address these shortcomings by compiling, to the best of our knowledge, the largest multilingual corpus for the medical domain in four languages, namely English, French, Italian and Spanish. This new corpus has been used to train Medical mT5, the first open-source text-to-text multilingual model for the medical domain.
MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering
Paper • 2404.05590 • PublishedNote In this paper we present MedExpQA, the first multilingual benchmark based on medical exams to evaluate LLMs in Medical Question Answering. To the best of our knowledge, MedExpQA includes for the first time reference gold explanations written by doctors which can be leveraged to establish various gold-based upper-bounds for comparison with LLMs performance.
Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks
Paper • 2404.00376 • Published • 1Note We introduce Meerkat-7B, a novel medical AI system with 7 billion parameters. Meerkat-7B was trained using our new synthetic dataset consisting of high-quality chain-of-thought reasoning paths sourced from 18 medical textbooks, along with diverse instruction-following datasets.
Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data
Paper • 2403.19031 • PublishedNote In this paper, we benchmarked various machine learning models, including classic SVMs, pretrained language models like RoBERTa, BERTweet, and SocBERT, and LLMs such as GPT-3.5 and GPT-4, across six text classification tasks using public social media data. We use LLMs either zero-shot, as annotator, or for data augmentation.
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Paper • 2403.18421 • Published • 20Note In this article, we release BioMedLM, a 2.7 billion parameter GPT-style autoregressive model trained exclusively on PubMed abstracts and full articles. When fine-tuned, BioMedLM can produce strong multiple-choice biomedical QA results competitive with much larger models, such as achieving a score of 57.3% on MedMCQA (dev) and 69.0% on the MMLU Medical Genetics exam.
A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages
Paper • 2403.18336 • PublishedNote This work presents a multilingual corpus of texts concerning ADRs gathered from diverse sources, including patient fora, social media, and clinical reports in German, French, and Japanese. Our corpus contains annotations covering 12 entity types, four attribute types, and 13 relation types.
Large Language Models in Biomedical and Health Informatics: A Bibliometric Review
Paper • 2403.16303 • PublishedNote In this review, we conducted a bibliometric analysis of research articles and collaboration networks from 2022 to 2023 to understand the application of LLMs in Biomedical and Health Informatics. We mapped out key trends and major developments, highlighting how LLMs enhance NLP applications in medical diagnosis, patient engagement, and personalized medicine.
Large Language Model for Mental Health: A Systematic Review
Paper • 2403.15401 • PublishedNote In this review, we discuss the research methodology used in the paper. The methodology chapter explains the data collection and analysis methods, including the type of research conducted, data collection techniques, and any tools or materials used. It also justifies the methodological choices made, allowing readers to evaluate the reliability and validity of the research.
Polaris: A Safety-focused LLM Constellation Architecture for Healthcare
Paper • 2403.13313 • Published • 1Note We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare, our work specifically focuses on long multi-turn voice conversations. We train our models on proprietary data, clinical care plans, healthcare regulatory documents, medical manuals, and other medical reasoning documents.
Electrocardiogram Instruction Tuning for Report Generation
Paper • 2403.04945 • Published • 2Note we propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions. To facilitate future research, we establish a benchmark to evaluate MEIT with various LLMs backbones across two large-scale ECG datasets. Our approach uniquely aligns the representations of the ECG signal and the report.
Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People
Paper • 2403.03640 • Published • 2Note In this article, we describe both the creation of the ApolloCorpora multilingual medical dataset and the XMedBench benchmark, and the training of our Apollo models, state-of-the-art LLMs of various relatively-small sizes (i.e., 0.5B, 1.8B, 2B, 6B, and 7B) which are capable of answering queries in the six most widely spoken languages world-wide.
To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering
Paper • 2403.01924 • Published • 1Note This paper presents MedGENIE, the first generate-then-read framework for multiple-choice question answering in medicine, which entails constructing artificial contexts through prompting instead of retreiving the context from PubMed. We conduct extensive experiments on MedQA-USMLE, MedMCQA, and MMLU.
Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey
Paper • 2403.01528 • Published • 1Note In this review, we provide an extensive analysis of recent advancements achieved through cross modeling of biomolecules and natural language. The study begins with an overview of biomolecular representations and delves into the integration of linguistic and molecular data, assessing its practical applications and resources.
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations
Paper • 2403.01469 • PublishedNote We introduce KorMedMCQA, the first Korean multiple-choice QA benchmark derived from Korean healthcare professional licensing examinations, covering from the year 2012 to year 2023. This dataset consists of a selection of questions from the license examinations for doctors, nurses, and pharmacists, featuring a diverse array of subjects.
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
Paper • 2402.18060 • PublishedNote In this study, we construct two new datasets: JAMA Clinical Challenge and Medbullets. The first consists of questions based on challenging clinical cases, while the second comprises USMLE Step 2&3 style clinical questions. Both datasets are structured as multiple-choice QA tasks, where each question is accompanied by an expert-written explanation.
Adaptation of Biomedical and Clinical Pretrained Models to French Long Documents: A Comparative Study
Paper • 2402.16689 • Published • 1Note In this paper, we present a comparative study of three adaptation strategies for long-sequence models, leveraging the Longformer architecture. We conducted evaluations of these models on 16 downstream tasks. Our findings reveal that further pre-training an English clinical model with French biomedical texts can outperform alternatives.
Towards Building Multilingual Language Model for Medicine
Paper • 2402.13963 • Published • 4Note In this paper, we aim to develop an open-source, multilingual language model for medicine, that the benefits a wider, linguistically diverse audience from different regions. We construct MMedC, a new multilingual medical corpus, (25.5B tokens across 6 languages), a new MCQA benchmark with rationale. We then finetuned several LLMs and evaluated them on the benchmark.
Benchmarking Retrieval-Augmented Generation for Medicine
Paper • 2402.13178 • Published • 5Note This work proposes the Medical Information Retrieval-Augmented Generation Evaluation (MIRAGE), a first-of-its-kind benchmark including 7,663 questions from five medical QA datasets, and discovers a log-linear scaling property and the "lost-in-the-middle"effects in medical RAG.
Efficiency at Scale: Investigating the Performance of Diminutive Language Models in Clinical Tasks
Paper • 2402.10597 • Published • 1Note In this study, we compare different Parameter Efficient Fine-tuning (PEFT) methods for clinical natural language processing tasks, using various sizes of language models. We evaluate the performance of these methods on three clinical tasks: de-identification, assertion detection, and mortality prediction.
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains
Paper • 2402.10373 • Published • 6Note In this paper, we introduce BioMistral, an open-source LLM for the biomedical domain, utilizing Mistral as its foundation and further pre-trained on PubMed Central. We conduct a comprehensive evaluation of BioMistral on 10 medical QA datasets in English. We also explore lightweight models obtained through quantization and model merging approaches.
Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations
Paper • 2402.07023 • Published • 1Note In this work, we evaluate both open-source and Google’s new multimodal LLM called Gemini across medical reasoning, hallucination detection, and medical visual question answering tasks. We also perform a detailed analysis by medical subject and test type. We release a Python module for medical LLM evaluation.
RareBench: Can LLMs Serve as Rare Diseases Specialists?
Paper • 2402.06341 • PublishedNote In this work, we introduce RareBench, a novel benchmark for assessing the performance of large language models (LLMs) on rare disease diagnosis and analysis. We also provide a rich dataset of rare disease cases, and a novel method to generate dynamic prompts using a rare disease knowledge graph. Our results show that our method improves LLMs’ diagnostic accuracy and interpretability.
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset
Paper • 2402.05547 • Published • 1Note We introduce ChatCoach, a system that helps medical students improve their communication skills with patients. It uses two AI agents: one that acts as a patient and one that acts as a coach. The student can talk to the patient agent and get feedback from the coach agent in real time. We compare the performance of ChatGPT and Llama2 for this task.
SA-MDKIF: A Scalable and Adaptable Medical Domain Knowledge Injection Framework for Large Language Models
Paper • 2402.00474 • PublishedNote In this study, we present SA-MDKIF, a framework that aims to inject medical knowledge into LLMs through instruction tuning, thereby enabling adaptability for various downstream tasks. SA-MDKIF consists of two stages: skill training and skill adaptation. We train a skill router to integrate the acquired skills with LLMs during inference.
Multimodal Clinical Pseudo-notes for Emergency Department Prediction Tasks using Multiple Embedding Model for EHR (MEME)
Paper • 2402.00160 • PublishedNote In this work, we introduce MEME, an approach that views EHR as multimodal data. This approach incorporates "pseudo-notes", textual representations of tabular EHR concepts such as diagnoses and medications, and allows us to effectively employ LLMs for EHR representation.
Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings
Paper • 2401.15713 • Published • 2Note In this paper, we target this issue by assembling niche datasetsusing co-citations as a similarity metric, focusing on biomedical domains. We employ two keystrategies: 1. Domain-specific Fine-Tuning, and 2. Universal Applicability with Mixture of Experts (MoE), adapting pretrained models with enforced routing for multiple domains simultaneously.
K-QA: A Real-World Medical Q&A Benchmark
Paper • 2401.14493 • PublishedNote We construct K-QA, a dataset containing 1,212 patient questions originating from real-world conversations held on K Health. We employ a panel of in-house physicians to answer and manually decompose a subset of K-QA into self-contained statements. Additionally, we formulate two NLI-based evaluation metrics. Finally, we use K-QA along with these metrics to evaluate several state-of-the-art models.
LongHealth: A Question Answering Benchmark with Long Clinical Documents
Paper • 2401.14490 • PublishedNote We present the LongHealth benchmark, comprising 20 detailed fictional patient cases across various diseases, with each case containing 5,090 to 6,754 words. The benchmark challenges LLMs with 400 multiple-choice questions in three categories: information extraction, negation, and sorting, challenging LLMs to extract and interpret information from large clinical documents.
PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge
Paper • 2401.11048 • Published • 1Note PubTator 3.0 is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases, and chemicals. It provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles.
Towards Conversational Diagnostic AI
Paper • 2401.05654 • Published • 13Note In this work, we introduce AMIE (Articulate Medical Intelligence Explorer), a LLM-based AI system optimized for diagnostic dialogue. AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts.
PeFoMed: Parameter Efficient Fine-tuning on Multimodal Large Language Models for Medical Visual Question Answering
Paper • 2401.02797 • PublishedNote In this paper, we propose a parameter efficient framework for fine-tuning MLLM specifically tailored to Med-VQA applications, and empirically validate it on a public benchmark dataset. We outperform the GPT-4v model by a significant margin of 26% absolute accuracy on closed-ended questions, based on a human evaluation.
Generalist embedding models are better at short-context clinical semantic search than specialized embedding models
Paper • 2401.01943 • Published • 6Note This study addresses these questions by constructing a textual dataset based on the ICD-10-CM code descriptions, widely used in US hospitals and containing many clinical terms, and their easily reproducible rephrasing. We then benchmarked existing embedding models, either generalist or specialized in the clinical domain.
MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries
Paper • 2401.01596 • PublishedNote This work introduces the task of multimodal medical question summarization for codemixed input in a low-resource setting. To address this gap, we introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset, which combines Hindi-English codemixed medical queries with visual aids.
Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing
Paper • 2401.00579 • Published • 2Note Our study investigates the potential of instruction tuning for biomedical language processing, applying this technique to two general LLMs of substantial scale. We present a comprehensive, instruction-based model trained on a dataset that consists of approximately 200,000 instruction-focused samples.
Explanatory Argument Extraction of Correct Answers in Resident Medical Exams
Paper • 2312.00567 • PublishedNote We present a new dataset which (i) includes explanatory arguments for both correct and incorrect answers; (ii) written by medical doctors to answer questions from the Spanish Residency Medical Exams. Furthermore, this new benchmark allows us to setup a novel extractive task which consists of identifying the explanation of the correct answer written by medical doctors.
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Paper • 2311.16079 • Published • 17Note In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2, and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines.
BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights
Paper • 2311.16075 • Published • 5Note In this paper, we investigate the potential of Large Language Models to complement biomedical knowledge graphs in the training of semantic models and introduce BioLORD-2023, a state-of-the-art model for semantic textual similarity and biomedical concept representation designed for the clinical domain.
Overview of Current Applications of Large Language Models in Various Medical Specialities
Paper • 2311.12882 • Published • 1Note This paper gives an overview of the latest applications of Large Language Models (LLMs) in the healthcare sector, highlighting their transformative role in enhancing medical care quality. We explore their utilization in various medical specialties, such as cancer diagnostics, dentistry, nephrology, dermatology, etc.
KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language Model
Paper • 2311.11564 • Published • 1Note We propose a model called KBioXLM, which transforms the multilingual pretrained model XLM-R into the biomedical domain using a knowledge-anchored approach. We achieve a biomedical multilingual corpus by incorporating three granularity knowledge alignments (entity, fact, and passage levels) into monolingual corpora.
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Paper • 2311.10537 • Published • 3Note We propose a novel Multi-disciplinary Collaboration (MC) framework for the medical domain that leverages LLM-based agents playing different roles and participating in a cooperative dialogue, which enhances their LLM competencies and reasoning skills. This framework is training-free and intuitive.
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
Paper • 2311.09774 • Published • 1Note We propose to transform heterogeneous data, from the both pre-training and supervised stages, into a unified, simple input-output pair format. We validate the new protocol in the domains where proprietary LLMs like ChatGPT perform relatively poorly, such as Traditional Chinese Medicine.
Autoregressive Language Models For Estimating the Entropy of Epic EHR Audit Logs
Paper • 2311.06401 • Published • 1Note Existing techniques to measure the complexity of workflow through EHR audit logs involve time- or frequency-based cross-sectional aggregations that are unable to capture the full complexity of a EHR session. We evaluate the usage of transformer-based tabular LMs in measuring the entropy of action sequences within workflow and release the evaluated models publicly.
Relation Extraction in underexplored biomedical domains: A diversity-optimised sampling and synthetic data generation approach
Paper • 2311.06364 • Published • 1Note We address the challenge of developing Relation Extraction models in biomedical areas, focusing on the sparsity of labeled data, particularly in the natural-products literature. We introduce a novel Greedy Maximum Entropy sampler to create a curated evaluation dataset and training sets using the LOTUS database.
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences
Paper • 2311.06025 • Published • 1Note We propose ChiMed-GPT, a new benchmark LLM designed explicitly for Chinese medical domain, with enlarged context length to 4,096 tokens and undergoes a comprehensive training regime with pre-training, SFT, and RLHF; and evaluations on real-world tasks including information extraction, question answering, and dialogue generation.
BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing
Paper • 2310.19975 • Published • 1Note We created the BioInstruct, comprising 25,005 instructions to instruction-tune LLMs(LLaMA 1 & 2, 7B & 13B version). The instructions were created by prompting the GPT-4 language model with three-seed samples randomly drawn from 80 human curated instructions. We then evaluated instruction-tuned LLMs on several BioNLP tasks.
MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation
Paper • 2310.14088 • Published • 1Note This study assesses the ability of state-of-the-art large language models (LLMs) including GPT-3.5, GPT-4, Falcon, and LLaMA 2 to identify patients with mild cognitive impairment (MCI) from discharge summaries and examines instances where the models' responses were misaligned with their reasoning.
Rather a Nurse than a Physician -- Contrastive Explanations under Investigation
Paper • 2310.11906 • Published • 1Note Contrastive explanations, where one decision is explained in contrast to another, are supposed to be closer to how humans explain decisions. We fine-tune and extract explanations from 3 chat models. A comparison between human and model rationales, both in contrastive and non-contrastive settings, shows that humans do not necessarily explain in a contrastive manner.
xMEN: A Modular Toolkit for Cross-Lingual Medical Entity Normalization
Paper • 2310.11275 • Published • 1Note We introduce xMEN, a modular system for cross-lingual medical entity normalization, which performs well in both low- and high-resource scenarios. When synonyms in the target language are scarce for a given terminology, we leverage English aliases via cross-lingual candidate generation. For candidate ranking, we incorporate a trainable cross-encoder model.
Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models
Paper • 2310.11266 • Published • 1Note We introduce BooksMed, a novel framework based on a Large Language Model (LLM) which uniquely emulates human cognitive processes to deliver evidence-based and reliable responses, utilizing the GRADE (Grading of Recommendations, Assessment, Development, and Evaluations) framework to effectively quantify evidence strength.
JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning
Paper • 2310.10083 • Published • 2Note We show the contribution of LoRA-based instruction-tuning to performance in Japanese medical question-answering tasks. Our findings suggest that LoRA-based instruction-tuning can partially incorporate domain-specific knowledge into LLMs, with larger models demonstrating more pronounced effects.
BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations
Paper • 2310.07276 • Published • 4Note We propose BioT5, a comprehensive pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations. BioT5 utilizes SELFIES for 100 robust molecular representations and extracts knowledge from the surrounding context of bio-entities in unstructured biological literature.
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Paper • 2310.05694 • Published • 3Note This survey outlines the capabilities of the currently developed LLMs for Healthcare and explicates their development process, with the aim of providing an overview of the development roadmap from traditional Pretrained Language Models (PLMs) to LLMs.
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Paper • 2310.00274 • Published • 2Note We release AfriSpeech, 200hrs of Pan-African English speech, 67,577 clips from 2,463 unique speakers across 120 indigenous accents from 13 countries for clinical and general domain ASR, a benchmark test set, with publicly available pre-trained models with SOTA performance on the AfriSpeech benchmark.
MedEdit: Model Editing for Medical Question Answering with External Knowledge Bases
Paper • 2309.16035 • Published • 1Note Our study delves into model editing utilizing in-context learning, aiming to improve LLM responses without the need for fine-tuning or retraining. Specifically, we propose a comprehensive retrieval strategy to extract medical facts from an external knowledge base, and then we incorporate them into the query prompt for the LLM.
Large Language Models and Control Mechanisms Improve Text Readability of Biomedical Abstracts
Paper • 2309.13202 • Published • 1Note In this work, we investigate the ability of state-of-the-art large language models (LLMs) on the task of biomedical abstract simplification, using the publicly available dataset for plain language adaptation of biomedical abstracts (PLABA).
HealthFC: A Dataset of Health Claims for Evidence-Based Medical Fact-Checking
Paper • 2309.08503 • Published • 1Note We introduce a dataset of 750 health-related claims, labeled for veracity by medical experts and backed with evidence from appropriate clinical studies. The dataset can be used for tasks related to automated fact-checking such as evidence retrieval, veracity prediction, and explanation generation.
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts
Paper • 2309.07430 • Published • 24Note In this work, we employ domain adaptation methods on eight LLMs, spanning six datasets and four distinct summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Our thorough quantitative assessment reveals trade-offs between models and adaptation methods.
Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes
Paper • 2309.00237 • Published • 2Note In this article, we create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train our specialized clinical large language model, Asclepius. Our findings convincingly demonstrate that synthetic clinical notes can serve as viable substitutes for real ones.
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge
Paper • 2308.16458 • Published • 9Note We present BioCoder, a benchmark developed to evaluate existing pre-trained models in generating bioinformatics code. In relation to function-code generation, BioCoder covers potential package dependencies, class declarations, and global variables. It incorporates functions and methods in Python and Java from GitHub and the Rosalind Project.
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Paper • 2308.14089 • Published • 24Note We introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides 276 longitudinal EHRs for grounding instruction-response pairs.
CMB: A Comprehensive Medical Benchmark in Chinese
Paper • 2308.08833 • Published • 1Note We propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety.
BIOptimus: Pre-training an Optimal Biomedical Language Model with Curriculum Learning for Named Entity Recognition
Paper • 2308.08625 • Published • 1Note This paper aims to investigate different pre-training methods, such as pre-training the biomedical LM from scratch and pre-training it in a continued fashion. We also propose and evaluate initializing weights for new tokens by distilling existing weights from the BERT model inside the context where the tokens were found.
Large Language Models to Identify Social Determinants of Health in Electronic Health Records
Paper • 2308.06354 • Published • 3Note This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented, and explored the role of synthetic clinical text for improving the extraction of these scarcely documented, yet extremely valuable, clinical data.
Med-HALT: Medical Domain Hallucination Test for Large Language Models
Paper • 2307.15343 • Published • 2Note This research paper focuses on the challenges posed by hallucinations in LLMs, particularly in the context of the medical domain. We propose a new benchmark and dataset, Med-HALT (Medical Domain Hallucination Test), designed specifically to evaluate and reduce hallucinations. Med-HALT includes two categories of tests reasoning and memory-based hallucination tests.
Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data
Paper • 2307.14385 • Published • 1Note In this work, we present the first comprehensive evaluation of multiple LLMs, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4, on various mental health prediction tasks via online text data.
Towards Generalist Biomedical AI
Paper • 2307.14334 • Published • 9Note Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. Med-PaLM M reaches performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin.
Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section
Paper • 2307.07051 • Published • 1Note We propose a framework to analyze the sections with high predictive power. Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large.
Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events
Paper • 2307.06439 • Published • 9Note In this paper, we study how LLMs can be used to scale biomedical knowledge curation. We find that while LLMs already possess decent competency in structuring biomedical text, by distillation into a task-specific student model through self-supervised learning, substantial gains can be attained over out-of-box LLMs.
EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models
Paper • 2307.02028 • Published • 3Note First, we publish a new dataset, EHRSHOT, which contains deidentified structured data from the electronic health records (EHRs) of 6,739 patients from Stanford Medicine. Second, we publish the weights of CLMBR-T-base, a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. Third, we define 15 few-shot clinical prediction tasks.
BioCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval
Paper • 2307.00589 • Published • 1Note We introduce BioCPT, a first-of-its-kind Contrastively Pre-trained Transformer model for zero-shot biomedical IR. To train BioCPT, we collected an unprecedented scale of 255 million user click logs from PubMed. With such data, we use contrastive learning to train a pair of closely-integrated retriever and re-ranker.
How far is Language Model from 100% Few-shot Named Entity Recognition in Medical Domain
Paper • 2307.00186 • Published • 1Note This paper aims to provide a thorough investigation to compare the performance of LMs in medical few-shot NER and answer How far is LMs from 100% Few-shot NER in Medical Domain, and moreover to explore an effective entity recognizer to help improve the NER performance.
Biomedical Language Models are Robust to Sub-optimal Tokenization
Paper • 2306.17649 • Published • 1Note In this work, we first find that standard open-domain and biomedical tokenizers are largely unable to segment biomedical terms into meaningful components. But surprisingly, we find that pre-training a biomedical LM using a more accurate biomedical tokenizer does not improve the entity representation quality of a language model.
CamemBERT-bio: a Tasty French Language Model Better for your Health
Paper • 2306.15550 • Published • 3Note We propose a new French public biomedical dataset on which we have continued the pre-training of CamemBERT. Thus, we introduce a first version of CamemBERT-bio, a specialized public model for the French biomedical domain that shows 2.54 points of F1 score improvement on average on different biomedical named entity recognition tasks.
Radiology-GPT: A Large Language Model for Radiology
Paper • 2306.08666 • Published • 1Note We introduce Radiology-GPT, a large language model for radiology. Using an instruction tuning approach on an extensive dataset of radiology domain knowledge, Radiology-GPT demonstrates superior performance compared to general language models such as StableLM, Dolly and LLaMA.
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Paper • 2306.08018 • Published • 2Note We introduce Mol-Instructions, a meticulously curated, comprehensive instruction dataset expressly designed for the biomolecular realm. Mol-Instructions is composed of three pivotal components: molecule-oriented instructions, protein-oriented instructions, and biomolecular text instructions.
Multilingual Clinical NER: Translation or Cross-lingual Transfer?
Paper • 2306.04384 • Published • 1Note This paper compares cross-lingual transfer with these two alternative methods, to perform clinical NER in French and in German without any training data in those languages. To this end, we release MedNERF a medical NER test set extracted from French drug prescriptions and annotated with the same guidelines as an English dataset.
ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation
Paper • 2306.02022 • Published • 1Note In this paper, we present the Ambient Clinical Intelligence Benchmark (ACI-BENCH) corpus, the largest dataset to date tackling the problem of AI-assisted note generation from visit dialogue. We also present the benchmark performances of several common state-of-the-art approaches.
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Paper • 2306.00890 • Published • 9Note We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. The key idea is to leverage a large-scale, broad-coverage biomedical figure-caption dataset extracted from PubMed Central, use GPT-4 to self-instruct instruction-following data from the captions.
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
Paper • 2305.17100 • Published • 1Note In this paper, we introduce a unified and generalist Biomedical Generative Pre-trained
Towards Expert-Level Medical Question Answering with Large Language Models
Paper • 2305.09617 • Published • 4Note We present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach.
Dr. LLaMA: Improving Small Language Models in Domain-Specific QA via Generative Data Augmentation
Paper • 2305.07804 • Published • 2Note In this paper, we introduce Dr. LLaMA, a method for improving SLMs through generative data augmentation using LLMs, focusing on medical question-answering tasks and the PubMedQA dataset. Our findings indicate that LLMs effectively refine and diversify existing question-answer pairs.
RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models
Paper • 2305.01146 • Published • 1Note We systematically investigate lightweight strategies to adapt large language models (LLMs) for the task of radiology report summarization (RRS). Our results on the MIMIC-III dataset consistently demonstrate best performance by maximally adapting to the task via pretraining on clinical text and parameter-efficient fine-tuning on RRS examples.
A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese
Paper • 2304.08999 • Published • 2Note In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. Since there wasno annotated corpus for biomedical entity extraction in Portuguese prior to this work, we also present the strategy we followed in annotating the corpus for the development of the models.
DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains
Paper • 2304.00958 • Published • 1Note In this paper, we propose an original study of PLMs in the medical domain on French language. We also release the first specialized PLMs for the biomedical field in French, called DrBERT, as well as the largest corpus of medical data under free license on which these models are trained.
ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge
Paper • 2303.14070 • Published • 7Note We collected more than 700 diseases and their corresponding symptoms, recommended medications, and required medical tests, and then generated 5K doctor-patient conversations. Models finetuned on these emerge with great potential to understand patients' needs, provide informed advice, and offer valuable assistance in a variety of medical-related fields.
Capabilities of GPT-4 on Medical Challenge Problems
Paper • 2303.13375 • Published • 1Note We present a comprehensive evaluation of GPT-4 on medical competency examinations and benchmark datasets. Our results show that GPT-4, without any specialized prompt crafting, exceeds the passing score on USMLE by over 20 points and outperforms earlier general-purpose models (GPT-3.5) as well as models specifically fine-tuned on medical knowledge (Med-PaLM, a tuned version of Flan-PaLM 540B).
MEDBERT.de: A Comprehensive German BERT Model for the Medical Domain
Paper • 2303.08179 • Published • 1Note The model has been trained on a large corpus of 4.7 Million German medical documents and has been shown to achieve new state-of-the-art performance on eight different medical benchmarks covering a wide range of disciplines and medical document types. In addition to evaluating the model, this paper also conducts an in-depth analysis of its capabilities.
Almanac: Retrieval-Augmented Language Models for Clinical Medicine
Paper • 2303.01229 • Published • 1Note Large language models have a tendency to generate factually incorrect and sometimes even toxic statements. By enabling these models to access external point-of-care tools in response to physician queries, we demonstrate significantly improved factual grounding, helpfulness, and safety in a variety of clinical scenarios.
Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing
Paper • 2303.00915 • Published • 4Note In this paper, we conducted by far the largest study on biomedical VLP, using 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central. BiomedCLIP established new state of the art in a wide range of standard datasets, substantially outperformed prior VLP approaches.
Do We Still Need Clinical Language Models?
Paper • 2302.08091 • Published • 3Note We show that relatively small specialized clinical models substantially outperform all in-context learning approaches, even when finetuned on limited annotated data. Further, we find that pretraining on clinical tokens allows for smaller, more parameter-efficient models that either match or outperform much larger language models trained on general text.
EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records
Paper • 2301.07695 • Published • 1Note We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff, including physicians, nurses, insurance review and health records teams, and more. Our dataset poses unique challenges: 1) generate SQL queries, 2) understand various time expressions, and 3) distinguish whether a given question is answerable.
Large Language Models Encode Clinical Knowledge
Paper • 2212.13138 • Published • 3Note We present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA.
Scientific and Creative Analogies in Pretrained Language Models
Paper • 2211.15268 • Published • 1Note This paper examines the encoding of analogy in large-scale pretrained language models. Existing analogy datasets typically focus on a limited set of analogical relations, with a high similarity of the two domains between which the analogy holds. On the other hand, SCAN contains systematic mappings of multiple attributes and relational structures across dissimilar domains.
RoentGen: Vision-Language Foundation Model for Chest X-ray Generation
Paper • 2211.12737 • Published • 2Note We fine-tuned a diffusion model on a corpus of publicly available chest x-rays (CXR) and their corresponding radiology (text) reports. We present evidence that the resulting model is able to create visually convincing, diverse synthetic CXR images, and that the output can be controlled by using free-form text prompts including radiology-specific language.
A Large-Scale Dataset for Biomedical Keyphrase Generation
Paper • 2211.12124 • Published • 1Note We introduce kp-biomed, the first large-scale biomedical keyphrase generation dataset with more than 5M documents collected from PubMed abstracts. We train and release several generative models and conduct a series of experiments showing that using large scale datasets improves significantly the performances for present and absent keyphrase generation.
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model
Paper • 2211.11363 • Published • 1Note Sequential task training may cause catastrophic forgetting, so we propose a continual pretraining method for the BERT-based model. Despite training only 3% of model parameters, our method could achieve better-than-SOTA performance (on chinese biomedical tasks).
Galactica: A Large Language Model for Science
Paper • 2211.09085 • Published • 3Note In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. It sets a new state-of-the-art on downstream tasks such as PubMedQA and MedMCQA dev of 77.6% and 52.9%.
BioLORD: Learning Ontological Representations from Definitions (for Biomedical Concepts and their Textual Descriptions)
Paper • 2210.11892 • Published • 2Note In this work, we propose a new method for learning vector representations of biomedical terms that are based on definitions and descriptions from a knowledge graph. Thanks to this grounding, our model produces more semantic concept representations than SapBERT, and which match more closely the hierarchical structure of ontologies. The model also generalizes to clinical sentences similarity (STS).