jaimin commited on
Commit
ffeedea
1 Parent(s): 20e51b2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -35
README.md CHANGED
@@ -3,27 +3,8 @@ tags:
3
  - image-to-text
4
  - image-captioning
5
  license: apache-2.0
6
- widget:
7
- - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
8
- example_title: Savanna
9
- - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
10
- example_title: Football Match
11
- - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
12
- example_title: Airport
13
  ---
14
 
15
- # nlpconnect/vit-gpt2-image-captioning
16
-
17
- This is an image captioning model trained by @ydshieh in [flax ](https://github.com/huggingface/transformers/tree/main/examples/flax/image-captioning) this is pytorch version of [this](https://huggingface.co/ydshieh/vit-gpt2-coco-en-ckpts).
18
-
19
-
20
- # The Illustrated Image Captioning using transformers
21
-
22
- ![](https://ankur3107.github.io/assets/images/vision-encoder-decoder.png)
23
-
24
- * https://ankur3107.github.io/blogs/the-illustrated-image-captioning-using-transformers/
25
-
26
-
27
  # Sample running code
28
 
29
  ```python
@@ -32,9 +13,9 @@ from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTok
32
  import torch
33
  from PIL import Image
34
 
35
- model = VisionEncoderDecoderModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
36
- feature_extractor = ViTFeatureExtractor.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
37
- tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
38
 
39
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
40
  model.to(device)
@@ -63,8 +44,6 @@ def predict_step(image_paths):
63
  return preds
64
 
65
 
66
- predict_step(['doctor.e16ba4e4.jpg']) # ['a woman in a hospital bed with a woman in a hospital bed']
67
-
68
  ```
69
 
70
  # Sample running code using transformers pipeline
@@ -73,18 +52,9 @@ predict_step(['doctor.e16ba4e4.jpg']) # ['a woman in a hospital bed with a woman
73
 
74
  from transformers import pipeline
75
 
76
- image_to_text = pipeline("image-to-text", model="nlpconnect/vit-gpt2-image-captioning")
77
-
78
- image_to_text("https://ankur3107.github.io/assets/images/image-captioning-example.png")
79
 
80
- # [{'generated_text': 'a soccer game with a player jumping to catch the ball '}]
81
 
82
 
83
- ```
84
-
85
 
86
- # Contact for any help
87
- * https://huggingface.co/ankur310794
88
- * https://twitter.com/ankur310794
89
- * http://github.com/ankur3107
90
- * https://www.linkedin.com/in/ankur310794
 
3
  - image-to-text
4
  - image-captioning
5
  license: apache-2.0
 
 
 
 
 
 
 
6
  ---
7
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  # Sample running code
9
 
10
  ```python
 
13
  import torch
14
  from PIL import Image
15
 
16
+ model = VisionEncoderDecoderModel.from_pretrained("jaimin/image_caption")
17
+ feature_extractor = ViTFeatureExtractor.from_pretrained("jaimin/image_caption")
18
+ tokenizer = AutoTokenizer.from_pretrained("jaimin/image_caption")
19
 
20
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
21
  model.to(device)
 
44
  return preds
45
 
46
 
 
 
47
  ```
48
 
49
  # Sample running code using transformers pipeline
 
52
 
53
  from transformers import pipeline
54
 
55
+ image_to_text = pipeline("image-to-text", model="jaimin/image_caption")
 
 
56
 
 
57
 
58
 
 
 
59
 
60
+ ```