Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper β’ 2411.19943 β’ Published Nov 29, 2024 β’ 57
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation Paper β’ 2412.02592 β’ Published Dec 3, 2024 β’ 21
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published 9 days ago β’ 229
LLM4SR: A Survey on Large Language Models for Scientific Research Paper β’ 2501.04306 β’ Published 10 days ago β’ 33
Agent Laboratory: Using LLM Agents as Research Assistants Paper β’ 2501.04227 β’ Published 10 days ago β’ 77
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper β’ 2501.03895 β’ Published 10 days ago β’ 48
Personalized Graph-Based Retrieval for Large Language Models Paper β’ 2501.02157 β’ Published 14 days ago β’ 28
OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System Paper β’ 2412.20005 β’ Published 21 days ago β’ 17
πͺ SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos β’ 12 items β’ Updated 26 days ago β’ 208
view article Article β΄οΈ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use By Ziyang β’ 14 days ago β’ 12
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper β’ 2501.01257 β’ Published 15 days ago β’ 47
view article Article Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK By davidberenstein1957 β’ Nov 21, 2024 β’ 35
view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ 15 days ago β’ 37
Executable Code Actions Elicit Better LLM Agents Paper β’ 2402.01030 β’ Published Feb 1, 2024 β’ 43
Open LLM Leaderboard best models β€οΈβπ₯ Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: β’ 65 items β’ Updated 44 minutes ago β’ 508
GTE models Collection General Text Embedding Models Released by Tongyi Lab of Alibaba Group β’ 19 items β’ Updated 28 days ago β’ 19
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Paper β’ 2412.13018 β’ Published Dec 17, 2024 β’ 41