Instructions to use mshz88/FADA-SKD-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use mshz88/FADA-SKD-4B with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3.5-4B") model = PeftModel.from_pretrained(base_model, "mshz88/FADA-SKD-4B") - Notebooks
- Google Colab
- Kaggle
FADA-SKD (4B)
Fetal Anatomy Delineation and Analysis — Selective Knowledge Distillation variant
Demo Video
Model Description
FADA-SKD (4B) is a vision-language model fine-tuned from Qwen3.5-VL (4B) using LoRA adapters and Selective Knowledge Distillation (SKD) from four ultrasound foundation models (FetalCLIP, UltraSAM, USF-MAE, UltraFedFM).
The model performs five fetal ultrasound tasks within a single end-to-end pipeline: clinical interpretation, anatomical classification, structure mapping, bounding-box detection, and polygon segmentation.
Key Features
- Unified multi-task: Interpretation, classification, detection, segmentation, and keypoint localization in a single model
- Selective KD: Feature distillation applied only to annotation tasks, preserving interpretation quality
- 5-phase pipeline: Interpret, Classify, Map, Detect, Segment
- Expert validated: Mean sonographer score 1.975/3.0 across 237 images (1 = clinically acceptable)
- Dual deployment: Autonomous mode and Human-in-the-Loop mode
Performance (4,478 test samples)
| Metric | Score |
|---|---|
| mAP@0.50 | 0.7671 |
| mAP@0.75 | 0.4402 |
| Dice | 0.8820 |
| IoU | 0.8149 |
| Classification Acc | 0.8379 |
| Sonographer Score | 1.975/3 |
Usage
Selective Knowledge Distillation
The core innovation: feature-level alignment from four domain-specific teachers is applied exclusively to annotation training data (detection, segmentation, classification), while interpretation training receives only supervised fine-tuning. This selective strategy outperforms full distillation across all tasks.
Teacher Ensemble
| Teacher | Weight | Specialization |
|---|---|---|
| FetalCLIP | 0.40 | Contrastive vision-language alignment |
| UltraSAM | 0.25 | Spatial segmentation features |
| USF-MAE | 0.20 | Self-supervised reconstruction |
| UltraFedFM | 0.15 | Federated multi-domain features |
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen3.5-VL 4B |
| LoRA | rank=16, alpha=16, applied to q/k/v/o/gate/up/down |
| Epochs | 3 |
| Learning rate | 2e-4 (cosine schedule) |
| Effective batch size | 8 |
| Hardware | Single NVIDIA RTX 4090 (24GB) |
| Training time | ~40 hours |
| Dataset | 56,805 interpretation conversations + 12,000 annotation images |
Citation
Links
| Resource | URL |
|---|---|
| Demo Video | YouTube |
| Web Application | HuggingFace Spaces |
| Dataset | Zenodo (DOI: 10.5281/zenodo.20104811) |
| Source Code | GitHub |
| Mobile Model (0.8B GGUF) | HuggingFace |
License
Apache License 2.0
- Downloads last month
- 49
