-
OLAPH: Improving Factuality in Biomedical Long-form Question Answering
Paper • 2405.12701 • Published • 1 -
COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain
Paper • 2405.10893 • Published -
Adapting Abstract Meaning Representation Parsing to the Clinical Narrative -- the SPRING THYME parser
Paper • 2405.09153 • Published -
MedConceptsQA -- Open Source Medical Concepts QA Benchmark
Paper • 2405.07348 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2404.18416
-
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 101 -
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Paper • 2405.01434 • Published • 44 -
WildChat: 1M ChatGPT Interaction Logs in the Wild
Paper • 2405.01470 • Published • 53 -
A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Paper • 2405.00332 • Published • 24