SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity Paper • 2401.17072 • Published Jan 30 • 22
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 82
PALP: Prompt Aligned Personalization of Text-to-Image Models Paper • 2401.06105 • Published Jan 11 • 46
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper • 2401.10891 • Published Jan 19 • 53
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains Paper • 2402.05140 • Published Feb 6 • 18
Self-Discover: Large Language Models Self-Compose Reasoning Structures Paper • 2402.03620 • Published Feb 6 • 102
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement Paper • 2402.07456 • Published Feb 12 • 38
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs Paper • 2311.05657 • Published Nov 9, 2023 • 26
Premise Order Matters in Reasoning with Large Language Models Paper • 2402.08939 • Published Feb 14 • 23
World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13 • 33
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 45
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20 • 18
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20 • 45
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method Paper • 2402.17193 • Published Feb 27 • 23
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 564
Instruction-tuned Language Models are Better Knowledge Learners Paper • 2402.12847 • Published Feb 20 • 25
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 92
Recovering the Pre-Fine-Tuning Weights of Generative Models Paper • 2402.10208 • Published Feb 15 • 6
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 30
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters Paper • 2403.02677 • Published Mar 5 • 16
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models Paper • 2403.05438 • Published Mar 8 • 14
Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings Paper • 2403.07750 • Published Mar 12 • 19
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect Paper • 2403.03853 • Published Mar 6 • 61
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts Paper • 2403.08268 • Published Mar 13 • 15
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14 • 25
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress Paper • 2402.19472 • Published Feb 29 • 2
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 119
Simple and Scalable Strategies to Continually Pre-train Large Language Models Paper • 2403.08763 • Published Mar 13 • 48
Enhancing Vision-Language Pre-training with Rich Supervisions Paper • 2403.03346 • Published Mar 5 • 12
VideoMamba: State Space Model for Efficient Video Understanding Paper • 2403.06977 • Published Mar 11 • 21
Gemma: Open Models Based on Gemini Research and Technology Paper • 2403.08295 • Published Mar 13 • 41
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models Paper • 2403.16999 • Published Mar 25 • 3
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos Paper • 2403.13044 • Published Mar 19 • 13
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22 • 16
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks Paper • 2403.14468 • Published Mar 21 • 18
Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM Paper • 2403.07487 • Published Mar 12 • 11
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 49
Gecko: Versatile Text Embeddings Distilled from Large Language Models Paper • 2403.20327 • Published Mar 29 • 41
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 99
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Paper • 2404.05717 • Published about 1 month ago • 23
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance Paper • 2404.04125 • Published Apr 4 • 25
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published 30 days ago • 30
CodecLM: Aligning Language Models with Tailored Synthetic Data Paper • 2404.05875 • Published about 1 month ago • 15
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published 27 days ago • 61
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Paper • 2404.12387 • Published 21 days ago • 34
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time Paper • 2404.10667 • Published 23 days ago • 11
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published 21 days ago • 49
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published 24 days ago • 27
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Paper • 2404.09990 • Published 24 days ago • 11
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 17 days ago • 228
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published 17 days ago • 118
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs Paper • 2404.16873 • Published 18 days ago • 22
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published 10 days ago • 60
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published 7 days ago • 71
FLAME: Factuality-Aware Alignment for Large Language Models Paper • 2405.01525 • Published 7 days ago • 16
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory Paper • 2310.17884 • Published Oct 27, 2023 • 1
Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video Paper • 2310.08584 • Published Oct 12, 2023 • 2