File size: 752 Bytes
a51bbf9
 
 
e567714
 
 
 
a51bbf9
 
 
2a28b4d
 
a51bbf9
 
46d20fd
a51bbf9
 
 
 
 
 
 
 
7f5a277
a51bbf9
039ce81
a51bbf9
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
language:
- fa
library_name: hezar
tags:
- image-to-text
- hezar
metrics:
- wer
pipeline_tag: image-to-text
datasets:
- hezarai/flickr30k-fa
---

A Persian image captioning model constructed from a ViT + RoBERTa architecture trained on [flickr30k-fa](https://www.kaggle.com/datasets/sajjadayobi360/flickrfa) (created by Sajjad Ayoubi).
The encoder (ViT) was initialized from https://huggingface.co/google/vit-base-patch16-224 and the decoder (RoBERTa) was initialized 
from https://huggingface.co/HooshvareLab/roberta-fa-zwnj-base .

## Usage
```
pip install hezar
```
```python
from hezar.models import Model

model = Model.load("hezarai/vit-roberta-fa-image-captioning-flickr30k")
captions = model.predict("example_image.jpg")
print(captions)
```