hezarai
/

vit-gpt2-fa-image-captioning-flickr30k

Model card Files Files and versions Community

vit-gpt2-fa-image-captioning-flickr30k / README.md

arxyzan's picture

Update README.md

bf0b1a7 8 months ago

|

history blame contribute delete

No virus

730 Bytes

	---
	language:
	- fa
	library_name: hezar
	tags:
	- image-to-text
	- hezar
	metrics:
	- wer
	pipeline_tag: image-to-text
	datasets:
	- hezarai/flickr30k-fa
	---

	A Persian image captioning model constructed from a ViT + GPT2 architecture trained on [flickr30k-fa](https://www.kaggle.com/datasets/sajjadayobi360/flickrfa) (created by Sajjad Ayoubi).
	The encoder (ViT) was initialized from https://huggingface.co/google/vit-base-patch16-224 and the decoder (GPT2) was initialized
	from https://huggingface.co/HooshvareLab/gpt2-fa .

	## Usage
	```
	pip install hezar
	```
	```python
	from hezar.models import Model

	model = Model.load("hezarai/vit-gpt2-fa-image-captioning-flickr30k")
	captions = model.predict("example_image.jpg")
	print(captions)
	```