nopperl
/

clip-ye-pop-llava_caption

Image-Text-to-Text

Model card Files Files and versions Community

clip-ye-pop-llava_caption / README.md

nopperl's picture

Specify right model card metadata (#1)

b08ec1c verified 4 months ago

|

raw history blame contribute delete

No virus

646 Bytes

	---
	license: apache-2.0
	tags:
	- llava
	datasets:
	- Ejafa/ye-pop
	pipeline_tag: image-text-to-text
	---

	A ViT-B/32 CLIP model trained for 4 epochs on the [ye-pop](https://huggingface.co/datasets/Ejafa/ye-pop) dataset (491,520 images and [LLaVA 1.5](https://github.com/haotian-liu/LLaVA)-generated detailed captions). Research artifact of [clip-synthetic-captions](https://github.com/nopperl/clip-synthetic-captions). Outperforms the CLIP model trained using the original alt-texts on the [DataComp benchmark suite](https://datacomp.ai) (38 image classification and retrieval tasks).

	Note: likely not directly useful as it is severely undertrained.