PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published 22 days ago • 30
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Paper • 2404.14396 • Published 25 days ago • 17
BLINK: Multimodal Large Language Models Can See but Not Perceive Paper • 2404.12390 • Published 29 days ago • 23
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published 28 days ago • 26
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published 24 days ago • 120
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? Paper • 2403.14624 • Published Mar 21 • 50
view article Article Orchestration of Experts: The First-Principle Multi-Model System By alirezamsh • Apr 16 • 8
view article Article Design choices for Vision Language Models in 2024 By gigant • about 1 month ago • 18
Llama2-7B HQQ+ Collection Extreme low-bit quantization with HQQ+ (HQQ + LoRA adapter) • 3 items • Updated 28 days ago • 14
DBRX Collection DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27 • 89
VideoMamba: State Space Model for Efficient Video Understanding Paper • 2403.06977 • Published Mar 11 • 21
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Paper • 2403.06764 • Published Mar 11 • 24
DragAnything: Motion Control for Anything using Entity Representation Paper • 2403.07420 • Published Mar 12 • 11
MetricX-23 Collection A collection of MetricX-23 models (https://aclanthology.org/2023.wmt-1.63/) • 6 items • Updated 3 days ago • 12
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models Paper • 2402.03749 • Published Feb 6 • 9
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19 • 50
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper • 2401.10891 • Published Jan 19 • 53
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Paper • 2312.04461 • Published Dec 7, 2023 • 48
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 8 items • Updated 3 days ago • 24
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively Paper • 2401.02955 • Published Jan 5 • 16
Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss Paper • 2401.02677 • Published Jan 5 • 21
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding Paper • 2401.04398 • Published Jan 9 • 18
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models Paper • 2401.04658 • Published Jan 9 • 24
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding Paper • 2401.04575 • Published Jan 9 • 14
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9 • 37
Instruct-Imagen: Image Generation with Multi-modal Instruction Paper • 2401.01952 • Published Jan 3 • 29
LLaMA Beyond English: An Empirical Study on Language Capability Transfer Paper • 2401.01055 • Published Jan 2 • 50
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 173
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion Paper • 2312.14327 • Published Dec 21, 2023 • 6
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation Paper • 2312.14187 • Published Dec 20, 2023 • 49
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases Paper • 2312.15011 • Published Dec 22, 2023 • 15
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 Paper • 2312.16171 • Published Dec 26, 2023 • 30
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces Paper • 2312.15715 • Published Dec 25, 2023 • 19
Make-A-Character: High Quality Text-to-3D Character Generation within Minutes Paper • 2312.15430 • Published Dec 24, 2023 • 25
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 55
BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing Paper • 2206.15076 • Published Jun 30, 2022 • 3
Continuous Learning in a Hierarchical Multiscale Neural Network Paper • 1805.05758 • Published May 15, 2018 • 1
Silkie: Preference Distillation for Large Visual Language Models Paper • 2312.10665 • Published Dec 17, 2023 • 10
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 253
LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 70 items • Updated about 15 hours ago • 304
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds Paper • 2312.09246 • Published Dec 14, 2023 • 5
LIME: Localized Image Editing via Attention Regularization in Diffusion Models Paper • 2312.09256 • Published Dec 14, 2023 • 8
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection Paper • 2312.09252 • Published Dec 14, 2023 • 9