-
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Paper • 2401.15391 • Published • 6 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 23 -
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 31 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 50
Collections
Discover the best community collections!
Collections including paper arxiv:2405.01535
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 50 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 46 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 126 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 18
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 5 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 13 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 10 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 62
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 43 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 6 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 19 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 573
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 75 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 63 -
Poro 34B and the Blessing of Multilinguality
Paper • 2404.01856 • Published • 12
-
Fusion-Eval: Integrating Evaluators with LLMs
Paper • 2311.09204 • Published • 5 -
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
Paper • 2311.06720 • Published • 6 -
Safurai 001: New Qualitative Approach for Code LLM Evaluation
Paper • 2309.11385 • Published • 2 -
Assessment of Pre-Trained Models Across Languages and Grammars
Paper • 2309.11165 • Published • 1
-
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 81 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 41 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 63 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 46
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 43 -
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Paper • 2312.14125 • Published • 41 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 28 -
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Paper • 2401.01256 • Published • 17
-
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Paper • 2303.16634 • Published • 1 -
miracl/miracl-corpus
Viewer • Updated • 77.2M • 5.12k • 39 -
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper • 2306.05685 • Published • 25 -
How is ChatGPT's behavior changing over time?
Paper • 2307.09009 • Published • 22
-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 31 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 50 -
Generative Judge for Evaluating Alignment
Paper • 2310.05470 • Published • 1 -
Calibrating LLM-Based Evaluator
Paper • 2309.13308 • Published • 10