Med-Flamingo-9B (CLIP ViT-L/14, Llama-7B)

Med-Flamingo is a medical vision-language model with multimodal in-context learning abilities.

This model is based on the OpenFlamingo-9B V1 model which uses the CLIP ViT-L/14 vision encoder and the Llama-7B language model as frozen backbones.

Med-Flamingo was trained on paired and interleaved image-text from the medical literature.

Check out our git repo for more details on setup & demo.