SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity Paper • 2401.17072 • Published Jan 30 • 23
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 85
PALP: Prompt Aligned Personalization of Text-to-Image Models Paper • 2401.06105 • Published Jan 11 • 46
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper • 2401.10891 • Published Jan 19 • 54
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains Paper • 2402.05140 • Published Feb 6 • 18
Self-Discover: Large Language Models Self-Compose Reasoning Structures Paper • 2402.03620 • Published Feb 6 • 106
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement Paper • 2402.07456 • Published Feb 12 • 39
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs Paper • 2311.05657 • Published Nov 9, 2023 • 26
Premise Order Matters in Reasoning with Large Language Models Paper • 2402.08939 • Published Feb 14 • 23
World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13 • 35
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 46
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20 • 19
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20 • 46
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method Paper • 2402.17193 • Published Feb 27 • 23
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 574
Instruction-tuned Language Models are Better Knowledge Learners Paper • 2402.12847 • Published Feb 20 • 24
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 92
Recovering the Pre-Fine-Tuning Weights of Generative Models Paper • 2402.10208 • Published Feb 15 • 7
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 30
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters Paper • 2403.02677 • Published Mar 5 • 16
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models Paper • 2403.05438 • Published Mar 8 • 15
Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings Paper • 2403.07750 • Published Mar 12 • 19
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect Paper • 2403.03853 • Published Mar 6 • 61
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts Paper • 2403.08268 • Published Mar 13 • 15
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14 • 25
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress Paper • 2402.19472 • Published Feb 29 • 2
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 123
Simple and Scalable Strategies to Continually Pre-train Large Language Models Paper • 2403.08763 • Published Mar 13 • 48
Enhancing Vision-Language Pre-training with Rich Supervisions Paper • 2403.03346 • Published Mar 5 • 12
VideoMamba: State Space Model for Efficient Video Understanding Paper • 2403.06977 • Published Mar 11 • 24
Gemma: Open Models Based on Gemini Research and Technology Paper • 2403.08295 • Published Mar 13 • 44
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models Paper • 2403.16999 • Published Mar 25 • 3
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos Paper • 2403.13044 • Published Mar 19 • 14
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22 • 17
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks Paper • 2403.14468 • Published Mar 21 • 18
Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM Paper • 2403.07487 • Published Mar 12 • 12
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 51
Gecko: Versatile Text Embeddings Distilled from Large Language Models Paper • 2403.20327 • Published Mar 29 • 47
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 102
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Paper • 2404.05717 • Published Apr 8 • 23
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance Paper • 2404.04125 • Published Apr 4 • 27
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published Apr 9 • 32
CodecLM: Aligning Language Models with Tailored Synthetic Data Paper • 2404.05875 • Published Apr 8 • 15
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 62
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Paper • 2404.12387 • Published Apr 18 • 36
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time Paper • 2404.10667 • Published Apr 16 • 13
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18 • 52
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15 • 29
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Paper • 2404.09990 • Published Apr 15 • 11
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 240
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 124
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29 • 67
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 106
FLAME: Factuality-Aware Alignment for Large Language Models Paper • 2405.01525 • Published May 2 • 21
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory Paper • 2310.17884 • Published Oct 27, 2023 • 1
Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video Paper • 2310.08584 • Published Oct 12, 2023 • 2
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 65
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Paper • 2404.19752 • Published Apr 30 • 20
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? Paper • 2405.05904 • Published May 9 • 5
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published May 1 • 30
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks Paper • 2405.08911 • Published May 14 • 1
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity Paper • 2403.14403 • Published Mar 21 • 6
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 44
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 116
Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published May 23 • 21
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis Paper • 2405.21075 • Published May 31 • 15
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published 26 days ago • 69
Verbalized Machine Learning: Revisiting Machine Learning with Language Models Paper • 2406.04344 • Published 26 days ago • 1
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published 26 days ago • 24
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published 22 days ago • 60
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published 16 days ago • 34
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published 15 days ago • 54
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published 12 days ago • 75
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Paper • 2406.14544 • Published 12 days ago • 33
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching Paper • 2406.06326 • Published 22 days ago • 1
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published 11 days ago • 55
Unlocking Continual Learning Abilities in Language Models Paper • 2406.17245 • Published 8 days ago • 27
Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published 7 days ago • 45