Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published 3 days ago • 35
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 9 days ago • 95
AnimateAnything: Consistent and Controllable Animation for Video Generation Paper • 2411.10836 • Published 8 days ago • 18
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published 5 days ago • 44
Rapid Response: Mitigating LLM Jailbreaks with a Few Examples Paper • 2411.07494 • Published 13 days ago • 1
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published 13 days ago • 30
FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs? Paper • 2411.05059 • Published 17 days ago • 1
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Paper • 2409.03420 • Published Sep 5 • 25
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? Paper • 2411.05000 • Published 17 days ago • 21
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper • 2411.04709 • Published 19 days ago • 25
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published 17 days ago • 48
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published 17 days ago • 109
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond Paper • 2411.03590 • Published 19 days ago • 9
Language Models can Self-Lengthen to Generate Long Texts Paper • 2410.23933 • Published 24 days ago • 16
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published 27 days ago • 74
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published Oct 22 • 24