How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published 15 days ago • 32
Improving Steering Vectors by Targeting Sparse Autoencoder Features Paper • 2411.02193 • Published 15 days ago • 1
Adaptive Length Image Tokenization via Recurrent Allocation Paper • 2411.02393 • Published 15 days ago • 12
Thinking Forward and Backward: Effective Backward Planning with Large Language Models Paper • 2411.01790 • Published 15 days ago • 1
Inference Optimal VLMs Need Only One Visual Token but Larger Models Paper • 2411.03312 • Published 14 days ago • 6
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published 20 days ago • 46
nGPT: Normalized Transformer with Representation Learning on the Hypersphere Paper • 2410.01131 • Published Oct 1 • 8
iVideoGPT: Interactive VideoGPTs are Scalable World Models Paper • 2405.15223 • Published May 24 • 12
Energy-Based Diffusion Language Models for Text Generation Paper • 2410.21357 • Published 22 days ago • 1
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published Oct 8 • 37
The Scene Language: Representing Scenes with Programs, Words, and Embeddings Paper • 2410.16770 • Published 28 days ago • 1
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Paper • 2410.10812 • Published Oct 14 • 14
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Paper • 2410.13835 • Published Oct 17 • 1
In-Context Learning Enables Robot Action Prediction in LLMs Paper • 2410.12782 • Published Oct 16 • 1