shengz's picture
Update README.md
c130aa0 verified
|
raw
history blame
2.2 kB
metadata
license: apache-2.0
tags:
  - image-text-to-text
  - medical
  - vision

LLaVA-Med v1.5, using mistralai/Mistral-7B-Instruct-v0.2 as LLM for a better commercial license

LLaVA-Med combines a pre-trained large language model with a pre-trained image encoder for biomedical multimodal chatbot use cases. LLaVA-Med was proposed in LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day by Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao.

Model date: LLaVA-Med-v1.5-Mistral-7B was trained in April 2024.

Paper or resources for more information: https://aka.ms/llava-med

Where to send questions or comments about the model: https://github.com/microsoft/LLaVA-Med/issues

License

mistralai/Mistral-7B-Instruct-v0.2 license.

Intended use

Primary intended uses: The primary use of LLaVA-Med is biomedical research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

  • 500K filtered image-text pairs from PubMed.
  • 60K GPT-generated multimodal instruction-following data.

Evaluation dataset

Medical Visual Chat

How to use

See Serving and Evaluation.

BibTeX entry and citation info

@article{li2023llavamed,
  title={Llava-med: Training a large language-and-vision assistant for biomedicine in one day},
  author={Li, Chunyuan and Wong, Cliff and Zhang, Sheng and Usuyama, Naoto and Liu, Haotian and Yang, Jianwei and Naumann, Tristan and Poon, Hoifung and Gao, Jianfeng},
  journal={arXiv preprint arXiv:2306.00890},
  year={2023}
}
}