Daily paper that is inspiring (abstract is enough) - a xmxx Collection

xmxx 's Collections

Daily paper that is inspiring (abstract is enough)

Daily paper that worth reading in details later

Daily paper that is inspiring (abstract is enough)

updated about 10 hours ago

World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13 • 33
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 75
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 91
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19 • 47
Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 93
Aria Everyday Activities Dataset

Paper • 2402.13349 • Published Feb 20 • 28
VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15 • 28
Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1 • 29
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9 • 29
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10 • 13
Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 80
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12 • 26
LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published May 15 • 78
Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16 • 25
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published 29 days ago • 44
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published 30 days ago • 33
Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published 29 days ago • 22
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published 25 days ago • 43
LLMs achieve adult human performance on higher-order theory of mind tasks

Paper • 2405.18870 • Published 20 days ago • 15
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Paper • 2406.04314 • Published 12 days ago • 26
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published 8 days ago • 57
Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published 8 days ago • 19
Mixture-of-Agents Enhances Large Language Model Capabilities

Paper • 2406.04692 • Published 11 days ago • 43
GenAI Arena: An Open Evaluation Platform for Generative Models

Paper • 2406.04485 • Published 12 days ago • 18
What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published 6 days ago • 35
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published 7 days ago • 27
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Paper • 2406.08418 • Published 6 days ago • 23