Generative AI Radiology VLM (Florence-2)

This model is a Parameter-Efficient Fine-Tuned (PEFT/LoRA) version of Microsoft's Florence-2-base. It has been specifically trained on the VQA-RAD dataset to act as a Generative AI Vision-Language Model capable of answering free-form textual questions about medical X-Rays.

Model Details

Architecture: Vision Encoder + Text Decoder (Florence-2)
Task: Medical Visual Question Answering (VQA)
Fine-Tuning Technique: Low-Rank Adaptation (LoRA)
Target Modules: q_proj, v_proj, o_proj

Training Results

The model was fine-tuned for 3 epochs on an NVIDIA A100-40GB GPU using mixed precision (fp16). The training loss steadily decreased, demonstrating strong anatomical and vocabulary convergence.

Local Web UI (Gradio)

The repository includes a local app.py script that loads these LoRA adapters and spins up a local web UI for inference.

Framework versions

PEFT 0.11.1
Transformers 4.42.4

Downloads last month: 37

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MrEngineer/florence-2-vqa-lora

Base model

microsoft/Florence-2-base

Adapter

(6)

this model