Sachinthaka Abeywardana commited on
Commit
e041481
1 Parent(s): 8f21e3d
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - image-to-text
6
+ license: mit
7
+ datasets:
8
+ - coco2017
9
+ ---
10
+
11
+ # Vit2-DistilGPT2
12
+ This model takes in an image and outputs a caption. It was trained using the Coco dataset and the full training script can be found in [this kaggle kernel](https://www.kaggle.com/sachin/visionencoderdecoder-model-training)
13
+
14
+ ## Usage
15
+ ```python
16
+ import Image
17
+ from transformers import AutoModel, GPT2Tokenizer, ViTFeatureExtractor
18
+ model = AutoModel.from_pretrained("sachin/vit2distilgpt2")
19
+ vit_feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
20
+ # make sure GPT2 appends EOS in begin and end
21
+ def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
22
+ outputs = [self.bos_token_id] + token_ids_0 + [self.eos_token_id]
23
+ return outputs
24
+
25
+ GPT2Tokenizer.build_inputs_with_special_tokens = build_inputs_with_special_tokens
26
+ gpt2_tokenizer = GPT2Tokenizer.from_pretrained("distilgpt2")
27
+ # set pad_token_id to unk_token_id -> be careful here as unk_token_id == eos_token_id == bos_token_id
28
+ gpt2_tokenizer.pad_token = gpt2_tokenizer.unk_token
29
+ image = (Image.open(image_path).convert("RGB"), return_tensors="pt").pixel_values
30
+ encoder_outputs = model.generate(image.unsqueeze(0))
31
+ generated_sentences = gpt2_tokenizer.batch_decode(encoder_outputs, skip_special_tokens=True)
32
+ ```
33
+ Note that the output sentence may be repeated, hence a post processing step may be required.
34
+
35
+ ## Bias Warning
36
+ This model may be biased due to dataset, lack of long training and the model itself. The following gender bias is an example.
37
+ ![](https://i.imgur.com/9zVN022.png)