shi-labs
/

probe_seg_llava-1.5-pt-0.5ift

Image-Text-to-Text

probe_dsg_llava_llama

text-generation

Inference Endpoints

Model card Files Files and versions Community

probe_seg_llava-1.5-pt-0.5ift / README.md

praeclarumjj3's picture

Update README.md

0a5c677 verified 13 days ago

|

1.12 kB

	---
	library_name: transformers
	license: apache-2.0
	language:
	- en
	pipeline_tag: image-text-to-text
	---

	# probe_seg_llava-1.5-pt-0.5ift

	This model checkpoint contains the seg probes for CLIP-ConvNeXT-XXL Llama-3-8b based LLaVA-1.5 model after the PT stage and 50% of the IFT stage, i.e., trained on the LLaVA-558K and 50% of the LLaVA-665K datasets. Please refer to [documentation](https://github.com/SHI-Labs/OLA-VLM/blob/main/docs/Probing.md) for more details.

	- GitHub Repo: [https://github.com/SHI-Labs/OLA-VLM](https://github.com/SHI-Labs/OLA-VLM)
	- Project Page: [https://praeclarumjj3.github.io/ola_vlm/](https://praeclarumjj3.github.io/ola_vlm/)

	## Citation

	If you found our work useful in your research, please consider starring ⭐ us on [GitHub](https://github.com/SHI-Labs/OLA-VLM) and citing 📚 us in your research!

	```
	@article{jain2024ola_vlm,
	title={{OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation}},
	author={Jitesh Jain and Zhengyuan Yang and Humphrey Shi and Jianfeng Gao and Jianwei Yang},
	journal={arXiv},
	year={2024}
	}
	```