Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward Paper • 2404.01258 • Published Apr 1 • 10
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs Paper • 2403.20041 • Published Mar 29 • 34
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces Paper • 2403.20275 • Published Mar 29 • 8
InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes Paper • 2401.05335 • Published Jan 10 • 26
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations Paper • 2401.01885 • Published Jan 3 • 27
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset Paper • 2402.05937 • Published Feb 8 • 11
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models Paper • 2403.17005 • Published Mar 25 • 13
AudioPaLM: A Large Language Model That Can Speak and Listen Paper • 2306.12925 • Published Jun 22, 2023 • 52
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition Paper • 2403.14148 • Published Mar 21 • 17
RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS Paper • 2403.13806 • Published Mar 20 • 18
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis Paper • 2403.13501 • Published Mar 20 • 9
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models Paper • 2403.13447 • Published Mar 20 • 17
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 58
Compress3D: a Compressed Latent Space for 3D Generation from a Single Image Paper • 2403.13524 • Published Mar 20 • 8
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation Paper • 2403.13745 • Published Mar 20 • 11
DepthFM: Fast Monocular Depth Estimation with Flow Matching Paper • 2403.13788 • Published Mar 20 • 16
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos Paper • 2403.13044 • Published Mar 19 • 14
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model Paper • 2403.13064 • Published Mar 19 • 31
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models Paper • 2403.13535 • Published Mar 20 • 21
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework Paper • 2403.13248 • Published Mar 20 • 76
Analyzing and Improving the Training Dynamics of Diffusion Models Paper • 2312.02696 • Published Dec 5, 2023 • 31
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis Paper • 2403.08764 • Published Mar 13 • 34
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Paper • 2306.02858 • Published Jun 5, 2023 • 18
Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia Paper • 2312.03664 • Published Dec 6, 2023 • 8
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis Paper • 2312.03491 • Published Dec 6, 2023 • 34
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 75
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 32
Orca-Math: Unlocking the potential of SLMs in Grade School Math Paper • 2402.14830 • Published Feb 16 • 24
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models Paper • 2402.15021 • Published Feb 22 • 12
GPTVQ: The Blessing of Dimensionality for LLM Quantization Paper • 2402.15319 • Published Feb 23 • 19
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning Paper • 2402.15506 • Published Feb 23 • 12
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation Paper • 2309.16653 • Published Sep 28, 2023 • 45
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk Paper • 2401.05033 • Published Jan 10 • 15
ZeroShape: Regression-based Zero-shot Shape Reconstruction Paper • 2312.14198 • Published Dec 21, 2023 • 7
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones Paper • 2312.16862 • Published Dec 28, 2023 • 30
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation Paper • 2312.13469 • Published Dec 20, 2023 • 10
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 35
DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics Paper • 2310.13268 • Published Oct 20, 2023 • 17
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models Paper • 2309.00986 • Published Sep 2, 2023 • 17