No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding Paper • 2405.08344 • Published 2 days ago • 7
Understanding the performance gap between online and offline alignment algorithms Paper • 2405.08448 • Published 2 days ago • 6
Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning Paper • 2405.08054 • Published 3 days ago • 11
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models Paper • 2403.06098 • Published Mar 10 • 15
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper • 2405.07990 • Published 3 days ago • 14
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks Paper • 2305.11175 • Published May 18, 2023 • 2
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 • 124
DOCCI: Descriptions of Connected and Contrasting Images Paper • 2404.19753 • Published 16 days ago • 9
Transferable and Principled Efficiency for Open-Vocabulary Segmentation Paper • 2404.07448 • Published Apr 11 • 10
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 10 items • Updated 4 days ago • 116
MGM Collection Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 13 items • Updated 13 days ago • 43
FLAME: Factuality-Aware Alignment for Large Language Models Paper • 2405.01525 • Published 14 days ago • 20
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment Paper • 2405.01481 • Published 14 days ago • 20
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published 17 days ago • 104
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published 14 days ago • 88
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training Paper • 2309.10400 • Published Sep 19, 2023 • 22
Spectrally Pruned Gaussian Fields with Neural Compensation Paper • 2405.00676 • Published 15 days ago • 8
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published 15 days ago • 23
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 Paper • 2405.00664 • Published 15 days ago • 16
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published 16 days ago • 61
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models Paper • 2404.17672 • Published 19 days ago • 17
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations Paper • 2404.17521 • Published 20 days ago • 12
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published 17 days ago • 62
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs Paper • 2404.16873 • Published 24 days ago • 25
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published 20 days ago • 30
view article Article 🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets By dvilasuero • 19 days ago • 54
ChatAnything: Facetime Chat with LLM-Enhanced Personas Paper • 2311.06772 • Published Nov 12, 2023 • 33
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs Paper • 2404.16375 • Published 21 days ago • 14
Interactive3D: Create What You Want by Interactive 3D Generation Paper • 2404.16510 • Published 21 days ago • 17
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published 21 days ago • 54
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension Paper • 2404.16790 • Published 21 days ago • 7
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published 21 days ago • 48
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published 23 days ago • 120
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Paper • 2404.14396 • Published 24 days ago • 17
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study Paper • 2404.14047 • Published 24 days ago • 37
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published 26 days ago • 37
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 24 days ago • 229
view article Article The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare 27 days ago • 64
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent 24 days ago • 71
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation Paper • 2404.13026 • Published 27 days ago • 21
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation Paper • 2404.12753 • Published 27 days ago • 38
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published 27 days ago • 26
TextSquare: Scaling up Text-Centric Visual Instruction Tuning Paper • 2404.12803 • Published 27 days ago • 27