CARP: Visuomotor Policy Learning
via Coarse-to-Fine Autoregressive Prediction
Zhefei Gong1,
Pengxiang Ding12,
Shangke Lyu1,
Siteng Huang12,
Mingyang Sun12,
Wei Zhao1,
Zhaoxin Fan3,
Donglin Wang1β
1Westlake University, 2Zhejiang University,
3Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing
π Overview
TL;DR: introduce Coarse-to-Fine AutoRegressive Policy (CARP), a novel paradigm for visuomotor policy learning that redefines the autoregressive action generation process as a coarse-to-fine, next-scale approach.
The left panel shows the final predicted trajectories for each task, with CARP producing smoother and more consistent paths than Diffusion Policy (DP). The right panel visualizes intermediate trajectories during the refinement process for CARP (top-right) and DP (bottom-right). DP displays considerable redundancy, resulting in slower processing and unstable training, as illustrated by 6 selected steps among 100 denoising steps. In contrast, CARP achieves efficient trajectory refinement across all 4 scales, with each step contributing meaningful updates.
π Acknowledgment
We sincerely thank the creators of the excellent repositories, including Visual Autoregressive Model, Diffusion Policy, and Sparse Diffusion Policy, which have provided invaluable inspiration.
π·οΈ License
This repository is released under the MIT license. See LICENSE MIT for additional details.
π Citation
If our findings contribute to your research, we would appreciate it if you could consider citing our paper in your publications.
@misc{gong2024carpvisuomotorpolicylearning,
title={CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction},
author={Zhefei Gong and Pengxiang Ding and Shangke Lyu and Siteng Huang and Mingyang Sun and Wei Zhao and Zhaoxin Fan and Donglin Wang},
year={2024},
eprint={2412.06782},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2412.06782},
}