An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published 1 day ago • 33
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published 4 days ago • 53
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published 8 days ago • 25
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published 8 days ago • 62
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 105
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 116
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published Apr 25 • 49
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 239
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 26
LucidDreaming: Controllable Object-Centric 3D Generation Paper • 2312.00588 • Published Nov 30, 2023 • 1
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 46
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models Paper • 2404.07724 • Published Apr 11 • 10
Best Practices and Lessons Learned on Synthetic Data for Language Models Paper • 2404.07503 • Published Apr 11 • 24
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples Paper • 2404.07544 • Published Apr 11 • 15
Audio Dialogues: Dialogues dataset for audio and music understanding Paper • 2404.07616 • Published Apr 11 • 14
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Paper • 2404.07839 • Published Apr 11 • 40
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 28
JetMoE: Reaching Llama2 Performance with 0.1M Dollars Paper • 2404.07413 • Published Apr 11 • 32
Transferable and Principled Efficiency for Open-Vocabulary Segmentation Paper • 2404.07448 • Published Apr 11 • 10
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 41
UniFL: Improve Stable Diffusion via Unified Feedback Learning Paper • 2404.05595 • Published Apr 8 • 22
ByteEdit: Boost, Comply and Accelerate Generative Image Editing Paper • 2404.04860 • Published Apr 7 • 24
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion Paper • 2403.05121 • Published Mar 8 • 17
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 51
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 30
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs Paper • 2402.15627 • Published Feb 23 • 32
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion Paper • 2402.10009 • Published Feb 15 • 18
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 18
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering Paper • 2402.10128 • Published Feb 15 • 14
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization Paper • 2402.09812 • Published Feb 15 • 11
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 33
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts Paper • 2402.09727 • Published Feb 15 • 35
DiffusionGPT: LLM-Driven Text-to-Image Generation System Paper • 2401.10061 • Published Jan 18 • 26
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens Paper • 2401.09985 • Published Jan 18 • 13
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Paper • 2312.09390 • Published Dec 14, 2023 • 32
AlignDet: Aligning Pre-training and Fine-tuning in Object Detection Paper • 2307.11077 • Published Jul 20, 2023 • 1