shi-labs
/

vcoder_ds_llava-v1.5-7b

Text Generation

vcoder_ds_llava

Inference Endpoints

Model card Files Files and versions Community

praeclarumjj3 commited on Dec 20, 2023

Commit

2ab7577

•

1 Parent(s): 4d5a268

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ license: apache-2.0
 VCoder-DS LLaVA-1.5-7b was trained on COST training dataset in December 2023. It uses the pretrained [LLaVA-1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) model weights. It was introduced by Jain et al. in [this repository](https://github.com/SHI-Labs/VCoder).
-VCoder is an adapter for improving existing Vision LLMs at object-level perception tasks with the use of perception modalities as control inputs while retaining performance on other tasks.
 ![img](https://praeclarumjj3.github.io/vcoder/vcoder.svg)
@@ -14,7 +14,7 @@ VCoder is an adapter for improving existing Vision LLMs at object-level percepti
 ```bibtex
 @article{jain2023vcoder,
-    title={{VCoder: Versatile Visual Encoder for Accurate Object-Level Perception with Large Language Models}},
     author={Jitesh Jain and Jianwei Yang and Humphrey Shi},
     journal={arXiv},
     year={2023}

 VCoder-DS LLaVA-1.5-7b was trained on COST training dataset in December 2023. It uses the pretrained [LLaVA-1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) model weights. It was introduced by Jain et al. in [this repository](https://github.com/SHI-Labs/VCoder).
+VCoder is an adapter for improving existing Multimodal LLMs at object-level perception tasks with the use of perception modalities as control inputs while retaining performance on other tasks.
 ![img](https://praeclarumjj3.github.io/vcoder/vcoder.svg)
 ```bibtex
 @article{jain2023vcoder,
+    title={{VCoder: Versatile Vision Encoders for Multimodal Large Language Models}},
     author={Jitesh Jain and Jianwei Yang and Humphrey Shi},
     journal={arXiv},
     year={2023}