Papers
arxiv:2411.14199

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Published on Nov 21
· Submitted by akariasai on Nov 22
Authors:
,
,
,
,
,
,
,
,

Abstract

Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses. To evaluate OpenScholar, we develop ScholarQABench, the first large-scale multi-domain benchmark for literature search, comprising 2,967 expert-written queries and 208 long-form answers across computer science, physics, neuroscience, and biomedicine. On ScholarQABench, OpenScholar-8B outperforms GPT-4o by 5% and PaperQA2 by 7% in correctness, despite being a smaller, open model. While GPT4o hallucinates citations 78 to 90% of the time, OpenScholar achieves citation accuracy on par with human experts. OpenScholar's datastore, retriever, and self-feedback inference loop also improves off-the-shelf LMs: for instance, OpenScholar-GPT4o improves GPT-4o's correctness by 12%. In human evaluations, experts preferred OpenScholar-8B and OpenScholar-GPT4o responses over expert-written ones 51% and 70% of the time, respectively, compared to GPT4o's 32%. We open-source all of our code, models, datastore, data and a public demo.

Community

Paper author Paper submitter

OpenScholar is a new retrieval-augmented LM designed for scientific literature synthesis. Built upon a datastore consisting of 45 million open-access papers, trained retriever, reranker and 8B LM, and self-feedback retreival-augmented genereation pipeline, it outperforms GPT4o as well as production systems such as Perplexity for literature synthesis. A public demo is available at https://openscholar.allen.ai/.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2411.14199 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.14199 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.14199 in a Space README.md to link it from this page.

Collections including this paper 2