Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2408.00714

image understanding

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 109

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 109
Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space

Paper • 2408.07416 • Published Aug 14, 2024 • 6
SMITE: Segment Me In TimE

Paper • 2410.18538 • Published Oct 24, 2024 • 15
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

Paper • 2410.23287 • Published Oct 30, 2024 • 19

Most interesting Papers

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 75
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 109
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 110

Best models to test

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 109

2024 Papers of the year

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 110
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 109
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 75

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 109
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

Paper • 2408.07931 • Published Aug 15, 2024 • 19
DELTA: Dense Efficient Long-range 3D Tracking for any video

Paper • 2410.24211 • Published Oct 31, 2024 • 8

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Paper • 2407.10960 • Published Jul 15, 2024 • 12
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

Paper • 2407.14482 • Published Jul 19, 2024 • 26
EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19, 2024 • 43
Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Paper • 2407.15017 • Published Jul 22, 2024 • 34

PAS: Data-Efficient Plug-and-Play Prompt Augmentation System

Paper • 2407.06027 • Published Jul 8, 2024 • 8
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Paper • 2407.09025 • Published Jul 12, 2024 • 130
Toto: Time Series Optimized Transformer for Observability

Paper • 2407.07874 • Published Jul 10, 2024 • 30
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

Paper • 2407.09413 • Published Jul 12, 2024 • 10

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 95
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 50
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6, 2024 • 34
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 109

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 14
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 5

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs