MambaVision: A Hybrid Mamba-Transformer Vision Backbone Paper • 2407.08083 • Published 11 days ago • 19 • 4
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective Paper • 2407.08583 • Published 11 days ago • 10 • 4
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On Paper • 2407.08348 • Published 11 days ago • 46 • 5
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published 12 days ago • 34 • 3
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence Paper • 2407.07061 • Published 13 days ago • 23 • 4
BM25S: Orders of magnitude faster lexical search via eager sparse scoring Paper • 2407.03618 • Published 18 days ago • 10 • 3
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations Paper • 2407.03471 • Published 19 days ago • 26 • 2
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published 17 days ago • 49 • 5
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published 19 days ago • 15 • 4
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published 19 days ago • 87 • 5
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation Paper • 2407.02371 • Published 20 days ago • 47 • 6
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds Paper • 2407.01494 • Published 21 days ago • 10 • 2
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published 25 days ago • 55 • 9
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs Paper • 2406.18120 • Published 26 days ago • 5 • 5
Simulating Classroom Education with LLM-Empowered Agents Paper • 2406.19226 • Published 25 days ago • 28 • 8
Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published 26 days ago • 47 • 5
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation Paper • 2406.18522 • Published 26 days ago • 40 • 3
Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation Paper • 2406.16678 • Published 28 days ago • 13 • 2
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published about 1 month ago • 14 • 2
OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far? Paper • 2406.16772 • Published 28 days ago • 2 • 2
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models Paper • 2406.16714 • Published 28 days ago • 10 • 2
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published 28 days ago • 52 • 4
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published 30 days ago • 43 • 8
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework Paper • 2406.14783 • Published Jun 20 • 15 • 2
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution Paper • 2406.13457 • Published Jun 19 • 15 • 2
Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models Paper • 2406.14599 • Published Jun 20 • 16 • 2
LiveMind: Low-latency Large Language Models with Simultaneous Inference Paper • 2406.14319 • Published Jun 20 • 14 • 4
$\nabla^2$DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials Paper • 2406.14347 • Published Jun 20 • 99 • 4
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts Paper • 2406.12034 • Published Jun 17 • 12 • 4
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations Paper • 2406.11801 • Published Jun 17 • 15 • 4
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation Paper • 2406.12849 • Published Jun 18 • 48 • 2
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization Paper • 2406.11431 • Published Jun 17 • 4 • 2
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI Paper • 2406.12753 • Published Jun 18 • 14 • 2
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers Paper • 2406.10163 • Published Jun 14 • 30 • 2
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published Jun 18 • 30 • 2
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper • 2406.12275 • Published Jun 18 • 29 • 10
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models Paper • 2406.11831 • Published Jun 17 • 19 • 4
GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors Paper • 2406.10111 • Published Jun 14 • 6 • 2
HelpSteer2: Open-source dataset for training top-performing reward models Paper • 2406.08673 • Published Jun 12 • 14 • 3
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery Paper • 2406.08587 • Published Jun 12 • 15 • 4