krypticmouse commited on
Commit
0e8c75e
1 Parent(s): 5fcaacd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -5
README.md CHANGED
@@ -1,8 +1,8 @@
1
  # Hindi Image Captioning Model
2
 
3
- This is an encoder-decoder image captioning model made with VIT encoder and GPT2-Hindi as a decoder. This is a first attempt at using ViT + GPT2-Hindi for image captioning task. We used the Flickr8k Hindi Dataset, which is the translated version of the original Flickr8k Dataset, available on kaggle to train the model.
4
 
5
- This model was trained using HuggingFace course community week, organized by Huggingface. Training were done on Kaggle Notebooks.
6
 
7
  ## How to use
8
 
@@ -21,8 +21,11 @@ else:
21
  url = 'https://shorturl.at/fvxEQ'
22
  image = Image.open(requests.get(url, stream=True).raw)
23
 
24
- feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224')
25
- tokenizer = AutoTokenizer.from_pretrained('surajp/gpt2-hindi')
 
 
 
26
  model = VisionEncoderDecoderModel.from_pretrained('team-indain-image-caption/hindi-image-captioning').to(device)
27
 
28
  #Inference
@@ -32,4 +35,10 @@ clean_text = lambda x: x.replace('<|endoftext|>','').split('\n')[0]
32
  caption_ids = model.generate(sample, max_length = 50)[0]
33
  caption_text = clean_text(tokenizer.decode(caption_ids))
34
  print(caption_text)
35
- ```
 
 
 
 
 
 
 
1
  # Hindi Image Captioning Model
2
 
3
+ This is an encoder-decoder image captioning model made with VIT encoder and GPT2-Hindi as a decoder. This is a first attempt at using ViT + GPT2-Hindi for image captioning task. We used the Flickr8k Hindi Dataset available on kaggle to train the model.
4
 
5
+ This model was trained using HuggingFace course community week, organized by Huggingface.
6
 
7
  ## How to use
8
 
 
21
  url = 'https://shorturl.at/fvxEQ'
22
  image = Image.open(requests.get(url, stream=True).raw)
23
 
24
+ encoder_checkpoint = 'google/vit-base-patch16-224'
25
+ decoder_checkpoint = 'surajp/gpt2-hindi'
26
+
27
+ feature_extractor = ViTFeatureExtractor.from_pretrained(encoder_checkpoint)
28
+ tokenizer = AutoTokenizer.from_pretrained(decoder_checkpoint)
29
  model = VisionEncoderDecoderModel.from_pretrained('team-indain-image-caption/hindi-image-captioning').to(device)
30
 
31
  #Inference
 
35
  caption_ids = model.generate(sample, max_length = 50)[0]
36
  caption_text = clean_text(tokenizer.decode(caption_ids))
37
  print(caption_text)
38
+ ```
39
+
40
+ ## Training data
41
+ We used the Flickr8k Hindi Dataset, which is the translated version of the original Flickr8k Dataset, available on Kaggle to train the model.
42
+
43
+ ## Training procedure
44
+ This model was trained during HuggingFace course community week, organized by Huggingface. The training was done on Kaggle GPU.