FADA-SKD (4B)

Fetal Anatomy Delineation and Analysis — Selective Knowledge Distillation variant

Demo Video

Watch the FADA Demo

Model Description

FADA-SKD (4B) is a vision-language model fine-tuned from Qwen3.5-VL (4B) using LoRA adapters and Selective Knowledge Distillation (SKD) from four ultrasound foundation models (FetalCLIP, UltraSAM, USF-MAE, UltraFedFM).

The model performs five fetal ultrasound tasks within a single end-to-end pipeline: clinical interpretation, anatomical classification, structure mapping, bounding-box detection, and polygon segmentation.

Key Features

  • Unified multi-task: Interpretation, classification, detection, segmentation, and keypoint localization in a single model
  • Selective KD: Feature distillation applied only to annotation tasks, preserving interpretation quality
  • 5-phase pipeline: Interpret, Classify, Map, Detect, Segment
  • Expert validated: Mean sonographer score 1.975/3.0 across 237 images (1 = clinically acceptable)
  • Dual deployment: Autonomous mode and Human-in-the-Loop mode

Performance (4,478 test samples)

Metric Score
mAP@0.50 0.7671
mAP@0.75 0.4402
Dice 0.8820
IoU 0.8149
Classification Acc 0.8379
Sonographer Score 1.975/3

Usage

Selective Knowledge Distillation

The core innovation: feature-level alignment from four domain-specific teachers is applied exclusively to annotation training data (detection, segmentation, classification), while interpretation training receives only supervised fine-tuning. This selective strategy outperforms full distillation across all tasks.

Teacher Ensemble

Teacher Weight Specialization
FetalCLIP 0.40 Contrastive vision-language alignment
UltraSAM 0.25 Spatial segmentation features
USF-MAE 0.20 Self-supervised reconstruction
UltraFedFM 0.15 Federated multi-domain features

Training Details

Parameter Value
Base model Qwen3.5-VL 4B
LoRA rank=16, alpha=16, applied to q/k/v/o/gate/up/down
Epochs 3
Learning rate 2e-4 (cosine schedule)
Effective batch size 8
Hardware Single NVIDIA RTX 4090 (24GB)
Training time ~40 hours
Dataset 56,805 interpretation conversations + 12,000 annotation images

Citation

Links

Resource URL
Demo Video YouTube
Web Application HuggingFace Spaces
Dataset Zenodo (DOI: 10.5281/zenodo.20104811)
Source Code GitHub
Mobile Model (0.8B GGUF) HuggingFace

License

Apache License 2.0

Downloads last month
49
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support