Edit model card

VCoder LLaVA-1.5-13b

VCoder LLaVA-1.5-13b was trained on COST training dataset in December 2023. It uses the pretrained LLaVA-1.5-13b model weights. It was introduced by Jain et al. in this repository.

VCoder is an adapter for improving existing Multimodal LLMs at object-level perception tasks with the use of perception modalities as control inputs while retaining performance on other tasks.

img

Citation

@article{jain2023vcoder,
    title={{VCoder: Versatile Vision Encoders for Multimodal Large Language Models}},
    author={Jitesh Jain and Jianwei Yang and Humphrey Shi},
    journal={arXiv},
    year={2023}
}
Downloads last month
22
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including shi-labs/vcoder_llava-v1.5-13b