Create README.md
Browse files---
inference: false
---
<br>
<br>
## PRISM Models
All models trained as part of the paper [Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models](https://arxiv.org/abs/2402.07865) by Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna, Percy Liang, Thomas Kollar, and Dorsa Sadigh. The latest PRISM models were trained in January 2024. For any issues or comments about any of the models, please raise an issue [here](https://github.com/TRI-ML/prismatic-vlms/issues).
## Licensing
PRISMs are released under an MIT license. Copyright (c) 2023 Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna and Toyota Research Institute. Toyota did not provide any of the materials used to train these models. They are here for reference and verification and evaluation of the training procedures described in the [paper](https://arxiv.org/abs/2402.07865) and as enabled in the [code](https://github.com/TRI-ML/prismatic-vlms). See the paper and the README in the training codebase for more details. These models are provided as-is. Toyota Research Institute disclaims all warranties, express or implied, including any warranty of merchantability and fitness for a particular purpose.
## Intended use
The primary use of PRISMs is for research and development on visually-conditioned language models by members of the machine learning and artificial intelligence research community. We hope these models can help users understand the design decisions that matter when training VLMs and serve as a first step in developing their own VLMs or evaluating existing VLMs for their desired use case.
## Training dataset
- LLaVA 1.5 training dataset
- LVIS-Instruct-4V
- LRV-Instruct
## Evaluation procedure
A collection of 11 benchmarks probing visual question answering, bounding box prediction, and challenge tasks (evaluating counting, spatial relationships, and propensity to hallucinate).