license: apache-2.0
tags:
- image-text-to-text
- medical
- vision
LLaVA-Med v1.5, using mistralai/Mistral-7B-Instruct-v0.2 as LLM for a better commercial license
LLaVA-Med combines a pre-trained large language model with a pre-trained image encoder for biomedical multimodal chatbot use cases. LLaVA-Med was proposed in LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day by Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao.
Model date: LLaVA-Med-v1.5-Mistral-7B was trained in April 2024.
Paper or resources for more information: https://aka.ms/llava-med
Where to send questions or comments about the model: https://github.com/microsoft/LLaVA-Med/issues
License
mistralai/Mistral-7B-Instruct-v0.2 license.
Intended use
Primary intended uses: The primary use of LLaVA-Med is biomedical research on large multimodal models and chatbots.
Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
Training dataset
- 500K filtered image-text pairs from PubMed.
- 60K GPT-generated multimodal instruction-following data.
Evaluation dataset
How to use
See Serving and Evaluation.
BibTeX entry and citation info
@article{li2023llavamed,
title={Llava-med: Training a large language-and-vision assistant for biomedicine in one day},
author={Li, Chunyuan and Wong, Cliff and Zhang, Sheng and Usuyama, Naoto and Liu, Haotian and Yang, Jianwei and Naumann, Tristan and Poon, Hoifung and Gao, Jianfeng},
journal={arXiv preprint arXiv:2306.00890},
year={2023}
}
}