Dileep7729 commited on
Commit
1033eed
·
verified ·
1 Parent(s): fa63d22

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -11
README.md CHANGED
@@ -19,29 +19,44 @@ This is a fine-tuned version of BLIP for visual answering on images. This model
19
 
20
  This experimental model can be used for answering questions on product images in retail industry. Product meta data enrichment, Validation of human generated product description are some of the examples sue case.
21
 
22
- Examples: (place images here)
23
 
24
- Input Image | Model Output
25
- ___________________________________________________________________________________________________________________________________________________________________________
26
-
27
 
28
- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/672d17c98e098bf429c83670/-Ux5mU-JDpZvdhNq-sSiw.jpeg) Model Output:- chips nachos
29
 
 
 
 
 
 
30
 
31
 
32
- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/672d17c98e098bf429c83670/-Z87gp9zWg2FiLTUCu8Ir.jpeg) Model Output:- a man in a suit walking across a crosswalk
33
 
 
 
 
 
34
 
 
 
35
 
36
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/672d17c98e098bf429c83670/YcSs_CFcRj-Tb4woXIArC.png) Model Output:- bush ' s best white beans
 
37
 
38
- ## Sample model predictions
 
 
39
 
40
- | Image | Description |
41
- |-------------------------------------|--------------------------------|
42
- | <img src="https://cdn-uploads.huggingface.co/production/uploads/672d17c98e098bf429c83670/YcSs_CFcRj-Tb4woXIArC.png" width=100 height=100 /> | bush ' s best white beans |
43
 
 
 
44
 
 
 
 
 
45
 
46
 
47
  ## BibTex and citation info
 
19
 
20
  This experimental model can be used for answering questions on product images in retail industry. Product meta data enrichment, Validation of human generated product description are some of the examples sue case.
21
 
 
22
 
 
 
 
23
 
24
+ # Sample model predictions
25
 
26
+ | Image | Description |
27
+ |------------------------------------------------------------------------------------------------------------------|-------------|
28
+ | <img src="https://cdn-uploads.huggingface.co/production/uploads/672d17c98e098bf429c83670/YcSs_CFcRj-Tb4woXlArC.png" width="100" height="100" /> | bush 's best white beans |
29
+ | <img src="https://cdn-uploads.huggingface.co/production/uploads/672d17c98e098bf429c83670/lTediQ7Zuez_CQQR7YIY0.png" width="100" height="100" /> | a bottle of milk sitting on a counter |
30
+ | <img src="https://cdn-uploads.huggingface.co/production/uploads/672d17c98e098bf429c83670/7r5oJ7BiSFkLt3nmT3RIv.jpeg" alt="image/jpeg" width="100" height="100" /> | a man in a suit walking across a crosswalk |
31
 
32
 
33
+ ### How to use the model:
34
 
35
+ '''
36
+ import requests
37
+ from PIL import Image
38
+ from transformers import BlipProcessor, BlipForConditionalGeneration
39
 
40
+ processor = BlipProcessor.from_pretrained("quadranttechnologies/qhub-blip-image-captioning-finetuned")
41
+ model = BlipForConditionalGeneration.from_pretrained("quadranttechnologies/qhub-blip-image-captioning-finetuned")
42
 
43
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
44
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
45
 
46
+ # conditional image captioning
47
+ text = "a photography of"
48
+ inputs = processor(raw_image, text, return_tensors="pt")
49
 
50
+ out = model.generate(**inputs)
51
+ print(processor.decode(out[0], skip_special_tokens=True))
 
52
 
53
+ # unconditional image captioning
54
+ inputs = processor(raw_image, return_tensors="pt")
55
 
56
+ out = model.generate(**inputs)
57
+ print(processor.decode(out[0], skip_special_tokens=True))
58
+
59
+ '''
60
 
61
 
62
  ## BibTex and citation info