CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects
Abstract
A coordinated diffusion noise optimization framework improves whole-body manipulation of articulated objects by leveraging specialized diffusion models for body and hand motions and a unified basis point set representation for precise hand-object interaction.
Synthesizing whole-body manipulation of articulated objects, including body motion, hand motion, and object motion, is a critical yet challenging task with broad applications in virtual humans and robotics. The core challenges are twofold. First, achieving realistic whole-body motion requires tight coordination between the hands and the rest of the body, as their movements are interdependent during manipulation. Second, articulated object manipulation typically involves high degrees of freedom and demands higher precision, often requiring the fingers to be placed at specific regions to actuate movable parts. To address these challenges, we propose a novel coordinated diffusion noise optimization framework. Specifically, we perform noise-space optimization over three specialized diffusion models for the body, left hand, and right hand, each trained on its own motion dataset to improve generalization. Coordination naturally emerges through gradient flow along the human kinematic chain, allowing the global body posture to adapt in response to hand motion objectives with high fidelity. To further enhance precision in hand-object interaction, we adopt a unified representation based on basis point sets (BPS), where end-effector positions are encoded as distances to the same BPS used for object geometry. This unified representation captures fine-grained spatial relationships between the hand and articulated object parts, and the resulting trajectories serve as targets to guide the optimization of diffusion noise, producing highly accurate interaction motion. We conduct extensive experiments demonstrating that our method outperforms existing approaches in motion quality and physical plausibility, and enables various capabilities such as object pose control, simultaneous walking and manipulation, and whole-body generation from hand-only data.
Community
(1) This paper focuses on generating whole-body (body and fingers) manipulation of articulated objects from text input. ๐
(2) The key idea is a novel coordinated diffusion noise optimization framework, where we perform noise-space optimization over three specialized diffusion models for the body, left hand, and right hand. The coordination naturally emerges through gradient flow along the human kinematic chain. ๐ช
(3) To improve the precision of manipulation, we adopt a unified representation based on basis point sets (BPS), where end-effector positions are encoded as distances to the same BPS used for object geometry. The resulting trajectories serve as targets to guide the optimization of diffusion noise, producing highly accurate motion. ๐
Check our project page at https://phj128.github.io/page/CoDA/index.html
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model (2025)
- IKMo: Image-Keyframed Motion Generation with Trajectory-Pose Conditioned Motion Diffusion Model (2025)
- MaskedManipulator: Versatile Whole-Body Control for Loco-Manipulation (2025)
- Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis (2025)
- Absolute Coordinates Make Motion Generation Easy (2025)
- UniMoGen: Universal Motion Generation (2025)
- MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper