Daily paper that is inspiring (abstract is enough) - a xmxx Collection

xmxx 's Collections

Daily paper that is inspiring (abstract is enough)

Daily paper that worth reading in details later

Daily paper that is inspiring (abstract is enough)

updated 3 days ago

World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13 • 36
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 78
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 92
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19 • 48
Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 94
Aria Everyday Activities Dataset

Paper • 2402.13349 • Published Feb 20 • 28
VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15 • 29
Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1 • 29
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9 • 29
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10 • 14
Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 80
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12 • 27
LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published May 15 • 81
Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16 • 26
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20 • 44
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20 • 33
Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published May 20 • 22
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published May 24 • 43
LLMs achieve adult human performance on higher-order theory of mind tasks

Paper • 2405.18870 • Published May 29 • 16
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Paper • 2406.04314 • Published Jun 6 • 26
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10 • 62
Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published Jun 10 • 19
Mixture-of-Agents Enhances Large Language Model Capabilities

Paper • 2406.04692 • Published Jun 7 • 50
GenAI Arena: An Open Evaluation Platform for Generative Models

Paper • 2406.04485 • Published Jun 6 • 19
What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12 • 38
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11 • 30
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Paper • 2406.08418 • Published Jun 12 • 28
DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17 • 45
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Paper • 2406.11271 • Published Jun 17 • 10
Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20 • 76
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

Paper • 2406.14347 • Published Jun 20 • 99
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Paper • 2406.14515 • Published Jun 20 • 29
Video-Infinity: Distributed Long Video Generation

Paper • 2406.16260 • Published 28 days ago • 28
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published 27 days ago • 75
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

Paper • 2406.18629 • Published 26 days ago • 37
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

Paper • 2406.19280 • Published 25 days ago • 55
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Paper • 2406.20095 • Published 24 days ago • 17
LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published 23 days ago • 37
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Paper • 2407.02371 • Published 20 days ago • 47
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Paper • 2407.01906 • Published 20 days ago • 33
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Paper • 2407.06189 • Published 14 days ago • 24
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Paper • 2407.13623 • Published 4 days ago • 37