Edit model card

VCoder-DS LLaVA-1.5-13b

VCoder-DS LLaVA-1.5-13b was trained on COST training dataset in December 2023. It uses the pretrained LLaVA-1.5-13b model weights. It was introduced by Jain et al. in this repository.

VCoder is an adapter for improving existing Multimodal LLMs at object-level perception tasks with the use of perception modalities as control inputs while retaining performance on other tasks.

img

Citation

@article{jain2023vcoder,
    title={{VCoder: Versatile Vision Encoders for Multimodal Large Language Models}},
    author={Jitesh Jain and Jianwei Yang and Humphrey Shi},
    journal={arXiv},
    year={2023}
}
Downloads last month
4,177
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using shi-labs/vcoder_ds_llava-v1.5-13b 3

Collection including shi-labs/vcoder_ds_llava-v1.5-13b