BridgeTower
/

bridgetower-base-itm-mlm

Inference Endpoints

Model card Files Files and versions Community

anahita-b commited on Dec 12, 2022

Commit

accc168

•

1 Parent(s): 8786058

Update examples in README.md

Files changed (1) hide show

README.md +35 -23

README.md CHANGED Viewed

@@ -28,42 +28,54 @@ You can use the raw model for image and text retrieval.
 ### How to use
-Here is how to use this model to get the features of a given text in PyTorch:
 ```python
-import os
-from PIL import Image
-from glob import glob
-from tqdm import tqdm
-import torch
 from transformers import BridgeTowerProcessor, BridgeTowerForImageAndTextRetrieval
-image_dir = "/datasets/COCO2017/val2017"
-search_text = "a woman holding an umbrella"
-processor = BridgeTowerProcessor.from_pretrained(("BridgeTower/bridgetower-base-itm-mlm"))
 model = BridgeTowerForImageAndTextRetrieval.from_pretrained("BridgeTower/bridgetower-base-itm-mlm")
-max_score = float('-inf')
-best_match_image = None
-image_paths = glob(os.path.join(image_dir, '*.jpg'))[:1000]
-for image_path in tqdm(image_paths, smoothing=1):
-    image = Image.open(image_path).convert("RGB")
-    inputs = processor(image, search_text, return_tensors="pt")
-    inputs = dict((k,v.to(device)) if isinstance(v, torch.Tensor) else (k,v) for k,v in inputs.items())
-    outputs = model(**inputs)
-    score = outputs.logits[0,1].item()
-    if score > max_score:
-        max_score = score
-        best_match_image = image_path
-print(max_score)
-print(best_match_image)
 ```
 ### Limitations and bias
 TODO

 ### How to use
+Here is how to use this model to perform image and text matching:
 ```python
 from transformers import BridgeTowerProcessor, BridgeTowerForImageAndTextRetrieval
+import requests
+from PIL import Image
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+texts = ["An image of two cats chilling on a couch", "A football player scoring a goal"]
+processor = BridgeTowerProcessor.from_pretrained("BridgeTower/bridgetower-base-itm-mlm")
 model = BridgeTowerForImageAndTextRetrieval.from_pretrained("BridgeTower/bridgetower-base-itm-mlm")
+# forward pass
+scores = dict()
+for text in texts:
+    # prepare inputs
+    encoding = processor(image, text, return_tensors="pt")
+    outputs = model(**encoding)
+    scores[text] = outputs.logits[0, :].item()
+```
+Here is how to use this model to perfom masked language modeling:
+```python
+from transformers import BridgeTowerProcessor, BridgeTowerForMaskedLM
+from PIL import Image
+url = "http://images.cocodataset.org/val2017/000000360943.jpg"
+image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
+text = "a <mask> looking out of the window"
+processor = BridgeTowerProcessor.from_pretrained(("BridgeTower/bridgetower-base-itm-mlm"))
+model = BridgeTowerForMaskedLM.from_pretrained("BridgeTower/bridgetower-base-itm-mlm")
+# prepare inputs
+encoding = processor(image, text, return_tensors="pt")
+# forward pass
+outputs = model(**encoding)
+results = processor.decode(outputs.logits.argmax(dim=-1).squeeze(0).tolist())
+print(results)
+a cat looking out of the window.
 ```
 ### Limitations and bias
 TODO