DiffusionGemma finetunes for radiology VQA

This repository contains LoRA finetunes of DiffusionGemma (image-conditioned discrete-diffusion LLM) for radiology visual question answering, each paired with an autoregressive Gemma-4 finetune as a controlled baseline. It corresponds to the paper Discrete Diffusion Language Models for Interactive Radiology Report Drafting.

The dataset covers mixed modalities/anatomy (VQA-RAD, SLAKE, VQA-Med: X-ray/CT/MRI, head/chest/abdomen). Judge-best checkpoint per cell.

Code: https://github.com/mxvp/discrete_diffusion_RRG

subfolder	backbone	base model	dataset	LLM-judge acc
diffusion-vqarad	discrete-diffusion	google/diffusiongemma-26B-A4B-it	VQA-RAD	0.649
ar-vqarad	autoregressive	google/gemma-4-26B-A4B-it	VQA-RAD	0.649
diffusion-slake	discrete-diffusion	google/diffusiongemma-26B-A4B-it	SLAKE	0.863
ar-slake	autoregressive	google/gemma-4-26B-A4B-it	SLAKE	0.817
diffusion-vqamed	discrete-diffusion	google/diffusiongemma-26B-A4B-it	VQA-Med	0.666
ar-vqamed	autoregressive	google/gemma-4-26B-A4B-it	VQA-Med	0.631