Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published Nov 12, 2024 • 62
Sharingan: Extract User Action Sequence from Desktop Recordings Paper • 2411.08768 • Published Nov 13, 2024 • 10
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 111
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published Nov 15, 2024 • 31
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper • 2410.08159 • Published Oct 10, 2024 • 25
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2, 2024 • 41
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Paper • 2406.10601 • Published Jun 15, 2024 • 65
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Paper • 2406.11768 • Published Jun 17, 2024 • 20
SUTRA: Scalable Multilingual Language Model Architecture Paper • 2405.06694 • Published May 7, 2024 • 37
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12, 2024 • 64