krypticmouse
commited on
Commit
•
0e8c75e
1
Parent(s):
5fcaacd
Update README.md
Browse files
README.md
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
# Hindi Image Captioning Model
|
2 |
|
3 |
-
This is an encoder-decoder image captioning model made with VIT encoder and GPT2-Hindi as a decoder. This is a first attempt at using ViT + GPT2-Hindi for image captioning task. We used the Flickr8k Hindi Dataset
|
4 |
|
5 |
-
This model was trained using HuggingFace course community week, organized by Huggingface.
|
6 |
|
7 |
## How to use
|
8 |
|
@@ -21,8 +21,11 @@ else:
|
|
21 |
url = 'https://shorturl.at/fvxEQ'
|
22 |
image = Image.open(requests.get(url, stream=True).raw)
|
23 |
|
24 |
-
|
25 |
-
|
|
|
|
|
|
|
26 |
model = VisionEncoderDecoderModel.from_pretrained('team-indain-image-caption/hindi-image-captioning').to(device)
|
27 |
|
28 |
#Inference
|
@@ -32,4 +35,10 @@ clean_text = lambda x: x.replace('<|endoftext|>','').split('\n')[0]
|
|
32 |
caption_ids = model.generate(sample, max_length = 50)[0]
|
33 |
caption_text = clean_text(tokenizer.decode(caption_ids))
|
34 |
print(caption_text)
|
35 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Hindi Image Captioning Model
|
2 |
|
3 |
+
This is an encoder-decoder image captioning model made with VIT encoder and GPT2-Hindi as a decoder. This is a first attempt at using ViT + GPT2-Hindi for image captioning task. We used the Flickr8k Hindi Dataset available on kaggle to train the model.
|
4 |
|
5 |
+
This model was trained using HuggingFace course community week, organized by Huggingface.
|
6 |
|
7 |
## How to use
|
8 |
|
|
|
21 |
url = 'https://shorturl.at/fvxEQ'
|
22 |
image = Image.open(requests.get(url, stream=True).raw)
|
23 |
|
24 |
+
encoder_checkpoint = 'google/vit-base-patch16-224'
|
25 |
+
decoder_checkpoint = 'surajp/gpt2-hindi'
|
26 |
+
|
27 |
+
feature_extractor = ViTFeatureExtractor.from_pretrained(encoder_checkpoint)
|
28 |
+
tokenizer = AutoTokenizer.from_pretrained(decoder_checkpoint)
|
29 |
model = VisionEncoderDecoderModel.from_pretrained('team-indain-image-caption/hindi-image-captioning').to(device)
|
30 |
|
31 |
#Inference
|
|
|
35 |
caption_ids = model.generate(sample, max_length = 50)[0]
|
36 |
caption_text = clean_text(tokenizer.decode(caption_ids))
|
37 |
print(caption_text)
|
38 |
+
```
|
39 |
+
|
40 |
+
## Training data
|
41 |
+
We used the Flickr8k Hindi Dataset, which is the translated version of the original Flickr8k Dataset, available on Kaggle to train the model.
|
42 |
+
|
43 |
+
## Training procedure
|
44 |
+
This model was trained during HuggingFace course community week, organized by Huggingface. The training was done on Kaggle GPU.
|