Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? Paper • 2405.05904 • Published 2 days ago • 2
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Paper • 2404.19752 • Published 11 days ago • 17
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published 11 days ago • 58
Proving Test Set Contamination in Black Box Language Models Paper • 2310.17623 • Published Oct 26, 2023 • 1
Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video Paper • 2310.08584 • Published Oct 12, 2023 • 2
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory Paper • 2310.17884 • Published Oct 27, 2023 • 1
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published 13 days ago • 96
FLAME: Factuality-Aware Alignment for Large Language Models Paper • 2405.01525 • Published 9 days ago • 19
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published 9 days ago • 81
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published 12 days ago • 62
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published 13 days ago • 61
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs Paper • 2404.16873 • Published 20 days ago • 23
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published 16 days ago • 48
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published 19 days ago • 119
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published 19 days ago • 21
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published 22 days ago • 37
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study Paper • 2404.14047 • Published 20 days ago • 37
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Paper • 2404.14396 • Published 19 days ago • 16
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 20 days ago • 228
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published 22 days ago • 26
TextSquare: Scaling up Text-Centric Visual Instruction Tuning Paper • 2404.12803 • Published 23 days ago • 27
MeshLRM: Large Reconstruction Model for High-Quality Mesh Paper • 2404.12385 • Published 23 days ago • 23
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published 24 days ago • 49
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Paper • 2404.12387 • Published 23 days ago • 34
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time Paper • 2404.10667 • Published 26 days ago • 11
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Paper • 2404.09990 • Published 26 days ago • 11
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Paper • 2404.09967 • Published 26 days ago • 20
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published 27 days ago • 27
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published 29 days ago • 61
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published about 1 month ago • 28
Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations Paper • 2404.07153 • Published Apr 10 • 1
CodecLM: Aligning Language Models with Tailored Synthetic Data Paper • 2404.05875 • Published Apr 8 • 15
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published Apr 9 • 31
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 92
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Paper • 2404.05726 • Published Apr 8 • 18
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Paper • 2404.05717 • Published Apr 8 • 23
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent Paper • 2404.03648 • Published Apr 4 • 22
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance Paper • 2404.04125 • Published Apr 4 • 26
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 99
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models Paper • 2404.01367 • Published Apr 1 • 19
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Paper • 2404.01331 • Published Mar 29 • 22
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Paper • 2403.20331 • Published Mar 29 • 14
Gecko: Versatile Text Embeddings Distilled from Large Language Models Paper • 2403.20327 • Published Mar 29 • 41
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks Paper • 2403.14468 • Published Mar 21 • 18
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22 • 16
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos Paper • 2403.13044 • Published Mar 19 • 13