Open-source embeddings and LLMs outperform Gemini and OpenAI for Web Navigation while being faster and cheaper Jun 21 • 7
Introducing BlindChat, an open-source and privacy-by-design Conversational AI fully in-browser Sep 22, 2023 • 1
AI Total Cost of Ownership Calculator: Evaluate the cost of in-house AI deployment vs AI APIs Sep 20, 2023 • 1
LLaVA-Critic: Learning to Evaluate Multimodal Models Paper • 2410.02712 • Published about 1 month ago • 34
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published Sep 30 • 53
Attention Prompting on Image for Large Vision-Language Models Paper • 2409.17143 • Published Sep 25 • 7
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18 • 36
A Controlled Study on Long Context Extension and Generalization in LLMs Paper • 2409.12181 • Published Sep 18 • 43
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Paper • 2408.13257 • Published Aug 23 • 25
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 115
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20 • 40
Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models Paper • 2408.06663 • Published Aug 13 • 15
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6 • 33
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Paper • 2407.19474 • Published Jul 28 • 22
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29 • 37
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? Paper • 2407.15711 • Published Jul 22 • 9
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct Paper • 2407.05700 • Published Jul 8 • 9
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages Paper • 2407.03321 • Published Jul 3 • 15
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published Jul 1 • 85