CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs
Abstract
This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA) model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand advanced cognitive abilities. Trained on a dataset comprising over 8,000 simulated flight trajectories across three key categories-Human Recognition, Symbol Understanding, and Reasoning-the model generates real-time 4D action commands based on first-person visual inputs and textual instructions. To further enhance performance in intricate scenarios, we propose CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM) reasoning module to simplify task directives prior to high-frequency control. Experimental evaluations using our open-source benchmark, CognitiveDroneBench, reveal that while a racing-oriented model (RaceVLA) achieves an overall success rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate improvements of up to 30% in critical cognitive tasks, underscoring the effectiveness of incorporating advanced reasoning capabilities into UAV control systems. Our contributions include the development of a state-of-the-art VLA model for UAV control and the introduction of the first dedicated benchmark for assessing cognitive tasks in drone operations. The complete repository is available at cognitivedrone.github.io
Community
This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA)
model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand
advanced cognitive abilities. Trained on a dataset comprising over 8,000
simulated flight trajectories across three key categories-Human Recognition,
Symbol Understanding, and Reasoning-the model generates real-time 4D action
commands based on first-person visual inputs and textual instructions. To
further enhance performance in intricate scenarios, we propose
CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM)
reasoning module to simplify task directives prior to high-frequency control.
Experimental evaluations using our open-source benchmark, CognitiveDroneBench,
reveal that while a racing-oriented model (RaceVLA) achieves an overall success
rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and
CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate
improvements of up to 30% in critical cognitive tasks, underscoring the
effectiveness of incorporating advanced reasoning capabilities into UAV control
systems. Our contributions include the development of a state-of-the-art VLA
model for UAV control and the introduction of the first dedicated benchmark for
assessing cognitive tasks in drone operations. The complete repository is
available at cognitivedrone.github.io
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- RaceVLA: VLA-based Racing Drone Navigation with Human-like Behaviour (2025)
- Evolution 6.0: Evolving Robotic Capabilities Through Generative Design (2025)
- Large Language Models for Multi-Robot Systems: A Survey (2025)
- DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control (2025)
- Scalable, Training-Free Visual Language Robotics: a modular multi-model framework for consumer-grade GPUs (2025)
- VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation (2025)
- Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper