arxiv:2503.23229

Citegeist: Automated Generation of Related Work Analysis on the arXiv Corpus

Published on Mar 29

Authors:

Abstract

Large Language Models provide significant new opportunities for the generation of high-quality written works. However, their employment in the research community is inhibited by their tendency to hallucinate invalid sources and lack of direct access to a knowledge base of relevant scientific articles. In this work, we present Citegeist: An application pipeline using dynamic Retrieval Augmented Generation (RAG) on the arXiv Corpus to generate a related work section and other citation-backed outputs. For this purpose, we employ a mixture of embedding-based similarity matching, summarization, and multi-stage filtering. To adapt to the continuous growth of the document base, we also present an optimized way of incorporating new and modified papers. To enable easy utilization in the scientific community, we release both, a website (https://citegeist.org), as well as an implementation harness that works with several different LLM implementations.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.23229 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.23229 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.23229 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.