MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? Paper • 2403.14624 • Published Mar 21 • 48
view article Article It's raining diffusion personalization techniques☔️🎭🖼️ By linoyts • 17 days ago • 15
view article Article Orchestration of Experts: The First-Principle Multi-Model System By alirezamsh • 12 days ago • 8
view article Article Mergoo: Efficiently Build Your Own MoE LLM By alirezamsh • about 8 hours ago • 26
Llama2-7B HQQ+ Collection Extreme low-bit quantization with HQQ+ (HQQ + LoRA adapter) • 3 items • Updated 9 days ago • 14
DBRX Collection DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27 • 84
VideoMamba: State Space Model for Efficient Video Understanding Paper • 2403.06977 • Published Mar 11 • 21
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Paper • 2403.06764 • Published Mar 11 • 24
DragAnything: Motion Control for Anything using Entity Representation Paper • 2403.07420 • Published Mar 12 • 11
MetricX-23 Collection A collection of MetricX-23 models (https://aclanthology.org/2023.wmt-1.63/) • 6 items • Updated 19 days ago • 12
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models Paper • 2402.03749 • Published Feb 6 • 9
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19 • 49
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper • 2401.10891 • Published Jan 19 • 53
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Paper • 2312.04461 • Published Dec 7, 2023 • 47
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 8 items • Updated 19 days ago • 21
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively Paper • 2401.02955 • Published Jan 5 • 16
Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss Paper • 2401.02677 • Published Jan 5 • 21
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding Paper • 2401.04398 • Published Jan 9 • 18
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models Paper • 2401.04658 • Published Jan 9 • 23
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding Paper • 2401.04575 • Published Jan 9 • 14
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9 • 37
Instruct-Imagen: Image Generation with Multi-modal Instruction Paper • 2401.01952 • Published Jan 3 • 29
LLaMA Beyond English: An Empirical Study on Language Capability Transfer Paper • 2401.01055 • Published Jan 2 • 49
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 171
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion Paper • 2312.14327 • Published Dec 21, 2023 • 6
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation Paper • 2312.14187 • Published Dec 20, 2023 • 48
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases Paper • 2312.15011 • Published Dec 22, 2023 • 15
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 Paper • 2312.16171 • Published Dec 26, 2023 • 30
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces Paper • 2312.15715 • Published Dec 25, 2023 • 19
Make-A-Character: High Quality Text-to-3D Character Generation within Minutes Paper • 2312.15430 • Published Dec 24, 2023 • 25
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 55
BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing Paper • 2206.15076 • Published Jun 30, 2022 • 3
Continuous Learning in a Hierarchical Multiscale Neural Network Paper • 1805.05758 • Published May 15, 2018 • 1
Silkie: Preference Distillation for Large Visual Language Models Paper • 2312.10665 • Published Dec 17, 2023 • 10
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 252
LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 63 items • Updated 3 days ago • 283
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds Paper • 2312.09246 • Published Dec 14, 2023 • 5
LIME: Localized Image Editing via Attention Regularization in Diffusion Models Paper • 2312.09256 • Published Dec 14, 2023 • 8
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection Paper • 2312.09252 • Published Dec 14, 2023 • 9
SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance Paper • 2312.08889 • Published Dec 13, 2023 • 10
TinyGSM: achieving >80% on GSM8k with small language models Paper • 2312.09241 • Published Dec 14, 2023 • 33
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models Paper • 2312.09608 • Published Dec 15, 2023 • 13