YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Astral Drone Models

Pre-trained inference models from Astral β€” the open-source autonomous drone platform. These models are used in the Astral drone daemon running on Jetson Orin Nano hardware (M1-A quadcopter, M1-G ground rover).

All models are ONNX or GGUF format, ready for on-device inference without a training environment.


Models

yolov8n_domain_v3.onnx β€” Aerial Domain Detector

Size: 11.7 MB Β· Format: ONNX Β· Architecture: YOLOv8n

9-class aerial domain detector trained on 48,000 sim+real images across three training rounds. Classifies the visual domain of an aerial frame (indoor corridor, outdoor open, urban rooftop, etc.) to route perception appropriately.

  • Input: [1, 3, 640, 640] float32 RGB
  • Output: standard YOLOv8 detection head
  • mAP50 after round 3: 0.384 (drone AP50: 0.087)
  • Training detail: blog post Β· technical report

vlm_lora_v1_q4km.gguf β€” VLM LoRA (Q4_K_M)

Size: 1.8 GB Β· Format: GGUF Q4_K_M Β· Base: Qwen2.5-VL-3B

Vision-language model fine-tuned with LoRA on drone-perspective navigation data. Used as the semantic target selector in Astral's modular autonomy stack β€” it answers "which object in this frame is the navigation goal?" and hands off to the metric depth module for 3D localization.

Run with llama.cpp or any GGUF-compatible runtime. Requires the paired mmproj file below.

  • Pair with: vlm_lora_v1_mmproj.gguf
  • Role in stack: semantic target selection (not end-to-end control)
  • Architecture detail: Closing the Metric Gap

vlm_lora_v1_mmproj.gguf β€” VLM Multimodal Projector

Size: 1.2 GB Β· Format: GGUF Β· Base: Qwen2.5-VL-3B mmproj

Multimodal projector for vlm_lora_v1_q4km.gguf. Required β€” the main GGUF will not load without it.

Pass to llama.cpp with --mmproj vlm_lora_v1_mmproj.gguf.


policy_v1.onnx + policy_v1.onnx.data β€” Reactive Policy

Size: 121 KB total Β· Format: ONNX (external data)

Lightweight MLP reactive policy for collision avoidance and low-level flight stabilization. Runs alongside the VLM planner β€” the VLM sets the goal, the reactive policy handles moment-to-moment obstacle response.

Both files must be present in the same directory; policy_v1.onnx.data contains the weight tensors.

  • Input: state vector (normalize with policy_v1_state_norm.npy before inference)
  • Output: velocity command delta

policy_v1_state_norm.npy β€” State Normalization

Size: 224 B Β· Format: NumPy array

Mean/std normalization constants for policy_v1.onnx input. Load with numpy.load('policy_v1_state_norm.npy', allow_pickle=True).item() β€” returns {'mean': ..., 'std': ...}.


depth_v1.onnx β€” Monocular Depth Model

Size: 1.6 MB Β· Format: ONNX

Lightweight monocular depth estimation model for metric grounding. Converts the VLM's object identification into a 3D position estimate for the planner. Designed to run in real time on Jetson Orin Nano.

  • Input: [1, 3, H, W] float32 RGB (model handles resize)
  • Output: [1, 1, H, W] relative depth map

Usage

The Astral daemon (daemon.py) downloads these automatically via setup_models.py:

python3 setup_models.py --domain-detector --reactive-policy --depth-model --vlm-drone

Or download individually with huggingface_hub:

from huggingface_hub import hf_hub_download

# Domain detector
hf_hub_download("astralhf/astral-drone-models", "yolov8n_domain_v3.onnx")

# VLM (both files required)
hf_hub_download("astralhf/astral-drone-models", "vlm_lora_v1_q4km.gguf")
hf_hub_download("astralhf/astral-drone-models", "vlm_lora_v1_mmproj.gguf")

# Reactive policy (both files + norm required)
hf_hub_download("astralhf/astral-drone-models", "policy_v1.onnx")
hf_hub_download("astralhf/astral-drone-models", "policy_v1.onnx.data")
hf_hub_download("astralhf/astral-drone-models", "policy_v1_state_norm.npy")

# Depth
hf_hub_download("astralhf/astral-drone-models", "depth_v1.onnx")

Architecture

These models implement the separation principle: semantic understanding (VLM) is decoupled from metric geometry (depth) and low-level control (reactive policy). This architecture is why end-to-end VLMs fail drone navigation while the modular stack succeeds β€” detailed in Closing the Metric Gap.

Related

License

Models are released under CC-BY-NC-4.0. Research and non-commercial use permitted with attribution. Contact astral.us for commercial licensing.

Downloads last month
-
GGUF
Model size
3B params
Architecture
qwen2vl
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support