EvoDriveVLA: Evolving Autonomous Driving VLA Models via Collaborative Perception-Planning Distillation

Jiajun Cao^1,2†, Xiaoan Zhang^1,2†, Xiaobao Wei^1†, Liyuqiu Huang^1,2, Wang Zijian², Hanzhen Zhang², Zhengyu Jia², Wei Mao², Xianming Liu², Shuchang Zhou², Yang Wang^2*, Shanghang Zhang^1*,

¹Peking University, ²XPENG

† Equal contribution

* Corresponding authors

Vision-Language-Action models have shown great promise for autonomous driving, yet they suffer from degraded perception after unfreezing the visual encoder and struggle with accumulated instability in long-term planning. To address these challenges, we propose EvoDriveVLA a novel collaborative perception-planning distillation framework that integrates self-anchored perceptual constraints and oracle-guided trajectory optimization. Specifically, self-anchored visual distillation leverages self-anchor teacher to deliver visual anchoring constraints, regularizing student representations via trajectory-guided key-region awareness. In parallel, oracle-guided trajectory distillation employs a future-aware oracle-teacher with coarse-to-fine trajectory refinement and Monte Carlo dropout sampling to produce high-quality trajectory candidates, thereby selecting the optimal trajectory to guide the student’s prediction.

📜 Citing

If you find EvoDriveVLA is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry:

🙏 Acknowledgement

Our work is primarily based on the following codebases:Impromptu-VLA, FSDrive and, OmniDrive.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Paipai-zxa/EvoDriveVLA

FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving

Paper • 2505.17685 • Published May 23, 2025 • 2