krypticmouse commited on
Commit
5fcaacd
1 Parent(s): 74406c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -2
README.md CHANGED
@@ -1,5 +1,35 @@
1
  # Hindi Image Captioning Model
2
 
3
- This is an encoder-decoder image captioning model made with VIT encoder and GPT2-Hindi as a decoder.
4
 
5
- This model was trained using HuggingFace course community week, organized by Huggingface. Training were done on Kaggle Notebooks
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Hindi Image Captioning Model
2
 
3
+ This is an encoder-decoder image captioning model made with VIT encoder and GPT2-Hindi as a decoder. This is a first attempt at using ViT + GPT2-Hindi for image captioning task. We used the Flickr8k Hindi Dataset, which is the translated version of the original Flickr8k Dataset, available on kaggle to train the model.
4
 
5
+ This model was trained using HuggingFace course community week, organized by Huggingface. Training were done on Kaggle Notebooks.
6
+
7
+ ## How to use
8
+
9
+ Here is how to use this model to caption an image of the Flickr8k dataset:
10
+ ```python
11
+ import torch
12
+ import requests
13
+ from PIL import Image
14
+ from transformers import ViTFeatureExtractor, AutoTokenizer, VisionEncoderDecoderModel
15
+
16
+ if torch.cuda.is_available():
17
+ device = 'cuda'
18
+ else:
19
+ device = 'cpu'
20
+
21
+ url = 'https://shorturl.at/fvxEQ'
22
+ image = Image.open(requests.get(url, stream=True).raw)
23
+
24
+ feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224')
25
+ tokenizer = AutoTokenizer.from_pretrained('surajp/gpt2-hindi')
26
+ model = VisionEncoderDecoderModel.from_pretrained('team-indain-image-caption/hindi-image-captioning').to(device)
27
+
28
+ #Inference
29
+ sample = feature_extractor(image, return_tensors="pt").pixel_values.to(device)
30
+ clean_text = lambda x: x.replace('<|endoftext|>','').split('\n')[0]
31
+
32
+ caption_ids = model.generate(sample, max_length = 50)[0]
33
+ caption_text = clean_text(tokenizer.decode(caption_ids))
34
+ print(caption_text)
35
+ ```