Collections
Discover the best community collections!
Collections including paper arxiv:2408.00714
-
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 109 -
Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space
Paper • 2408.07416 • Published • 6 -
SMITE: Segment Me In TimE
Paper • 2410.18538 • Published • 15 -
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Paper • 2410.23287 • Published • 19
-
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 109 -
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Paper • 2408.07931 • Published • 19 -
DELTA: Dense Efficient Long-range 3D Tracking for any video
Paper • 2410.24211 • Published • 8
-
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Paper • 2407.10960 • Published • 12 -
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Paper • 2407.14482 • Published • 26 -
EVLM: An Efficient Vision-Language Model for Visual Understanding
Paper • 2407.14177 • Published • 43 -
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Paper • 2407.15017 • Published • 34
-
PAS: Data-Efficient Plug-and-Play Prompt Augmentation System
Paper • 2407.06027 • Published • 8 -
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper • 2407.09025 • Published • 130 -
Toto: Time Series Optimized Transformer for Observability
Paper • 2407.07874 • Published • 30 -
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Paper • 2407.09413 • Published • 10
-
Depth Anything V2
Paper • 2406.09414 • Published • 95 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 50 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 34 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 109
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published • 1 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 14 -
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Paper • 2303.17376 • Published -
Sigmoid Loss for Language Image Pre-Training
Paper • 2303.15343 • Published • 5