Generative AI Radiology VLM (Florence-2)

This model is a Parameter-Efficient Fine-Tuned (PEFT/LoRA) version of Microsoft's Florence-2-base. It has been specifically trained on the VQA-RAD dataset to act as a Generative AI Vision-Language Model capable of answering free-form textual questions about medical X-Rays.

Model Details

  • Architecture: Vision Encoder + Text Decoder (Florence-2)
  • Task: Medical Visual Question Answering (VQA)
  • Fine-Tuning Technique: Low-Rank Adaptation (LoRA)
  • Target Modules: q_proj, v_proj, o_proj

Training Results

The model was fine-tuned for 3 epochs on an NVIDIA A100-40GB GPU using mixed precision (fp16). The training loss steadily decreased, demonstrating strong anatomical and vocabulary convergence.

Training Loss

Local Web UI (Gradio)

The repository includes a local app.py script that loads these LoRA adapters and spins up a local web UI for inference.

Gradio Web UI Demo

Framework versions

  • PEFT 0.11.1
  • Transformers 4.42.4
Downloads last month
37
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MrEngineer/florence-2-vqa-lora

Adapter
(6)
this model