FADA-SKD (4B)

Fetal Anatomy Delineation and Analysis — Selective Knowledge Distillation variant

Demo Video

Model Description

FADA-SKD (4B) is a vision-language model fine-tuned from Qwen3.5-VL (4B) using LoRA adapters and Selective Knowledge Distillation (SKD) from four ultrasound foundation models (FetalCLIP, UltraSAM, USF-MAE, UltraFedFM).

The model performs five fetal ultrasound tasks within a single end-to-end pipeline: clinical interpretation, anatomical classification, structure mapping, bounding-box detection, and polygon segmentation.

Key Features

Unified multi-task: Interpretation, classification, detection, segmentation, and keypoint localization in a single model
Selective KD: Feature distillation applied only to annotation tasks, preserving interpretation quality
5-phase pipeline: Interpret, Classify, Map, Detect, Segment
Expert validated: Mean sonographer score 1.975/3.0 across 237 images (1 = clinically acceptable)
Dual deployment: Autonomous mode and Human-in-the-Loop mode

Performance (4,478 test samples)

Metric	Score
mAP@0.50	0.7671
mAP@0.75	0.4402
Dice	0.8820
IoU	0.8149
Classification Acc	0.8379
Sonographer Score	1.975/3

Usage

Selective Knowledge Distillation

The core innovation: feature-level alignment from four domain-specific teachers is applied exclusively to annotation training data (detection, segmentation, classification), while interpretation training receives only supervised fine-tuning. This selective strategy outperforms full distillation across all tasks.

Teacher Ensemble

Teacher	Weight	Specialization
FetalCLIP	0.40	Contrastive vision-language alignment
UltraSAM	0.25	Spatial segmentation features
USF-MAE	0.20	Self-supervised reconstruction
UltraFedFM	0.15	Federated multi-domain features

Training Details

Parameter	Value
Base model	Qwen3.5-VL 4B
LoRA	rank=16, alpha=16, applied to q/k/v/o/gate/up/down
Epochs	3
Learning rate	2e-4 (cosine schedule)
Effective batch size	8
Hardware	Single NVIDIA RTX 4090 (24GB)
Training time	~40 hours
Dataset	56,805 interpretation conversations + 12,000 annotation images

Citation

Links

Resource	URL
Demo Video	YouTube
Web Application	HuggingFace Spaces
Dataset	Zenodo (DOI: 10.5281/zenodo.20104811)
Source Code	GitHub
Mobile Model (0.8B GGUF)	HuggingFace

License

Apache License 2.0

Downloads last month: 49