SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency Paper • 2407.17470 • Published 2 days ago • 9
Visual Haystacks: Answering Harder Questions About Sets of Images Paper • 2407.13766 • Published 8 days ago • 2
CGB-DM: Content and Graphic Balance Layout Generation with Transformer-based Diffusion Model Paper • 2407.15233 • Published 5 days ago • 6
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models Paper • 2407.15642 • Published 4 days ago • 7
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities Paper • 2407.12982 • Published 9 days ago • 4
A Comparative Study on Automatic Coding of Medical Letters with Explainability Paper • 2407.13638 • Published 8 days ago • 4
Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection Paper • 2407.11854 • Published 10 days ago • 2
OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects Paper • 2407.08711 • Published 15 days ago • 5
BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark Paper • 2407.07788 • Published 16 days ago • 1
Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities Paper • 2407.07080 • Published 17 days ago • 20
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper • 2407.06135 • Published 18 days ago • 19
CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion Blur Images Paper • 2407.03923 • Published 22 days ago • 7
HEMM: Holistic Evaluation of Multimodal Foundation Models Paper • 2407.03418 • Published 23 days ago • 8
ProgressGym: Alignment with a Millennium of Moral Progress Paper • 2406.20087 • Published 28 days ago • 3
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation Paper • 2407.02371 • Published 24 days ago • 47
Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models Paper • 2407.00111 • Published 29 days ago • 5
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs Paper • 2406.20086 • Published 28 days ago • 3
RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network Paper • 2406.18284 • Published about 1 month ago • 17
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language Paper • 2406.20085 • Published 28 days ago • 9
Wavelets Are All You Need for Autoregressive Image Generation Paper • 2406.19997 • Published 28 days ago • 27
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents Paper • 2407.00114 • Published 29 days ago • 12
Efficient World Models with Context-Aware Tokenization Paper • 2406.19320 • Published 29 days ago • 7
Benchmarking Mental State Representations in Language Models Paper • 2406.17513 • Published Jun 25 • 3
Fast and Uncertainty-Aware SVBRDF Recovery from Multi-View Capture using Frequency Domain Analysis Paper • 2406.17774 • Published Jun 25 • 3
Large Language Models Assume People are More Rational than We Really are Paper • 2406.17055 • Published Jun 24 • 4
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning Paper • 2406.17770 • Published Jun 25 • 18
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models Paper • 2406.15704 • Published Jun 22 • 5
Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations Paper • 2406.13632 • Published Jun 19 • 5
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Paper • 2406.11768 • Published Jun 17 • 20
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models Paper • 2406.12042 • Published Jun 17 • 8
VideoLLM-online: Online Video Large Language Model for Streaming Video Paper • 2406.11816 • Published Jun 17 • 20
Evaluating Open Language Models Across Task Types, Application Domains, and Reasoning Types: An In-Depth Experimental Analysis Paper • 2406.11402 • Published Jun 17 • 6
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality Paper • 2406.08845 • Published Jun 13 • 8
Designing a Dashboard for Transparency and Control of Conversational AI Paper • 2406.07882 • Published Jun 12 • 9
MaskLID: Code-Switching Language Identification through Iterative Masking Paper • 2406.06263 • Published Jun 10 • 5
Decoding the Diversity: A Review of the Indic AI Research Landscape Paper • 2406.09559 • Published Jun 13 • 5
GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors Paper • 2406.10111 • Published Jun 14 • 6
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering Paper • 2406.10208 • Published Jun 14 • 21
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models Paper • 2406.09403 • Published Jun 13 • 18
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding Paper • 2406.09411 • Published Jun 13 • 18
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning Paper • 2406.09170 • Published Jun 13 • 24
HelpSteer2: Open-source dataset for training top-performing reward models Paper • 2406.08673 • Published Jun 12 • 14
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery Paper • 2406.08587 • Published Jun 12 • 15
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published Jun 11 • 35
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation Paper • 2406.08656 • Published Jun 12 • 7
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts Paper • 2406.09162 • Published Jun 13 • 13
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs Paper • 2406.08657 • Published Jun 12 • 9