OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects Paper • 2407.08711 • Published 15 days ago • 5
Scaling Up Personalized Aesthetic Assessment via Task Vector Customization Paper • 2407.07176 • Published 17 days ago • 3
Generalizable Implicit Motion Modeling for Video Frame Interpolation Paper • 2407.08680 • Published 15 days ago • 7
Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data Paper • 2407.08726 • Published 15 days ago • 8
Towards Building Specialized Generalist AI with System 1 and System 2 Fusion Paper • 2407.08642 • Published 15 days ago • 9
Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models Paper • 2407.08701 • Published 15 days ago • 8
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective Paper • 2407.08583 • Published 15 days ago • 10
Autoregressive Speech Synthesis without Vector Quantization Paper • 2407.08551 • Published 15 days ago • 12
SEED-Story: Multimodal Long Story Generation with Large Language Model Paper • 2407.08683 • Published 15 days ago • 19
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception Paper • 2407.08303 • Published 16 days ago • 17
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Paper • 2407.08083 • Published 16 days ago • 23
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist Paper • 2407.08733 • Published 15 days ago • 18
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Paper • 2407.08296 • Published 16 days ago • 28
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Paper • 2407.07053 • Published 17 days ago • 38
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On Paper • 2407.08348 • Published 16 days ago • 48
A False Sense of Safety: Unsafe Information Leakage in 'Safe' AI Responses Paper • 2407.02551 • Published 24 days ago • 7
Eliminating Position Bias of Language Models: A Mechanistic Approach Paper • 2407.01100 • Published 26 days ago • 6
Investigating Decoder-only Large Language Models for Speech-to-text Translation Paper • 2407.03169 • Published 23 days ago • 9
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents Paper • 2407.03300 • Published 23 days ago • 10
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published 24 days ago • 16
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models Paper • 2407.02687 • Published 24 days ago • 20
TabReD: A Benchmark of Tabular Machine Learning in-the-Wild Paper • 2406.19380 • Published 29 days ago • 47
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages Paper • 2407.03321 • Published 23 days ago • 14
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion Paper • 2407.01392 • Published 25 days ago • 39
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models Paper • 2407.01906 • Published 25 days ago • 33
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models Paper • 2304.06364 • Published Apr 13, 2023 • 2
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection Paper • 2203.09509 • Published Mar 17, 2022 • 1
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding Paper • 2407.01791 • Published 25 days ago • 5
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds Paper • 2407.01494 • Published 25 days ago • 12
Revealing Fine-Grained Values and Opinions in Large Language Models Paper • 2406.19238 • Published 29 days ago • 13
What Matters in Detecting AI-Generated Videos like Sora? Paper • 2406.19568 • Published 29 days ago • 13
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models Paper • 2407.01920 • Published 25 days ago • 13
Understanding Alignment in Multimodal LLMs: A Comprehensive Study Paper • 2407.02477 • Published 24 days ago • 20
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency Paper • 2407.02398 • Published 24 days ago • 14
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Paper • 2407.02490 • Published 24 days ago • 23
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published 25 days ago • 41
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation Paper • 2407.02371 • Published 24 days ago • 47
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published 25 days ago • 80
TokenPacker: Efficient Visual Projector for Multimodal LLM Paper • 2407.02392 • Published 24 days ago • 20
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published 23 days ago • 88
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment Paper • 2406.19736 • Published 29 days ago • 1
Efficient World Models with Context-Aware Tokenization Paper • 2406.19320 • Published 29 days ago • 7
Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity Paper • 2406.17720 • Published Jun 25 • 7
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning Paper • 2407.00617 • Published 27 days ago • 7
ProgressGym: Alignment with a Millennium of Moral Progress Paper • 2406.20087 • Published 28 days ago • 3
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models Paper • 2406.19999 • Published 28 days ago • 3
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging Paper • 2407.01470 • Published 25 days ago • 5
Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines for Zero-Shot NER Paper • 2407.01272 • Published 25 days ago • 8
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs Paper • 2406.20086 • Published 28 days ago • 3
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix Paper • 2407.00367 • Published 28 days ago • 9
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language Paper • 2406.20085 • Published 28 days ago • 9
Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models Paper • 2407.00111 • Published 29 days ago • 5