-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 21 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 9 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 31 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2403.15371
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 30 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 41 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 33 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 58
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 30 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 33 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 2 -
AQuA: A Benchmarking Tool for Label Quality Assessment
Paper • 2306.09467 • Published • 1
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 30 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper • 2403.19655 • Published • 14 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 5 -
Enabling Memory Safety of C Programs using LLMs
Paper • 2404.01096 • Published • 1
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 30 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 24 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 23 -
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Paper • 2404.12387 • Published • 35
-
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
Paper • 2311.00871 • Published • 2 -
Can large language models explore in-context?
Paper • 2403.15371 • Published • 30 -
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Paper • 2205.05055 • Published • 2 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 33
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 52 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 20 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 63 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 54
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 37 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 91 -
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 50 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 2
-
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper • 2403.04732 • Published • 18 -
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper • 2403.07508 • Published • 70 -
DragAnything: Motion Control for Anything using Entity Representation
Paper • 2403.07420 • Published • 11 -
Learning and Leveraging World Models in Visual Representation Learning
Paper • 2403.00504 • Published • 25