arxiv:2407.05530

This&That: Language-Gesture Controlled Video Generation for Robot Planning

Published on Jul 8

· Submitted by

HikariDawn on Jul 11

Upvote

Authors:

Boyang Wang ,

Adam Fishman ,

Jeong Joon Park

Abstract

We propose a robot learning method for communicating, planning, and executing a wide range of tasks, dubbed This&That. We achieve robot planning for general tasks by leveraging the power of video generative models trained on internet-scale data containing rich physical and semantic context. In this work, we tackle three fundamental challenges in video-based planning: 1) unambiguous task communication with simple human instructions, 2) controllable video generation that respects user intents, and 3) translating visual planning into robot actions. We propose language-gesture conditioning to generate videos, which is both simpler and clearer than existing language-only methods, especially in complex and uncertain environments. We then suggest a behavioral cloning design that seamlessly incorporates the video plans. This&That demonstrates state-of-the-art effectiveness in addressing the above three challenges, and justifies the use of video generation as an intermediate representation for generalizable task planning and execution. Project website: https://cfeng16.github.io/this-and-that/.

View arXiv page View PDF Add to collection

Community

HikariDawn

Paper author Paper submitter Jul 11

This&That, an dynamic robot video generation model with language and simple gestures conditioning! Moreover, we propose Diffusion Video to Action (DiVA) model to transfer generated videos to robot actions in the rollout environment. Homepage is at: https://cfeng16.github.io/this-and-that/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2407.05530 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.05530 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.05530 in a Space README.md to link it from this page.