LMDrive: Closed-Loop End-to-End Driving with Large Language Models Paper • 2312.07488 • Published Dec 12, 2023
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models Paper • 2403.16999 • Published Mar 25, 2024 • 4
MoVA: Adapting Mixture of Vision Experts to Multimodal Context Paper • 2404.13046 • Published Apr 19, 2024 • 1
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping Paper • 2412.11279 • Published 19 days ago • 12
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping Paper • 2412.11279 • Published 19 days ago • 12
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping Paper • 2412.11279 • Published 19 days ago • 12 • 2
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published 18 days ago • 23
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Paper • 2412.09618 • Published 22 days ago • 21
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Paper • 2412.09618 • Published 22 days ago • 21
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models Paper • 2406.11831 • Published Jun 17, 2024 • 21
VisCoT Collection Visual CoT: Unleashing Chain-of-Thought Reasoning in the Multi-Modal Language Model • 5 items • Updated Jun 13, 2024 • 2