microsoft
/

BiomedVLP-CXR-BERT-specialized

feature-extraction

Model card Files Files and versions Community

Ozan Oktay commited on May 13, 2022

Commit

278b9b5

•

1 Parent(s): 61277cc

Update README.md

Files changed (1) hide show

README.md +28 -0

README.md CHANGED Viewed

@@ -52,6 +52,34 @@ The primary intended use is to support AI researchers building on top of this wo
 **Any** deployed use case of the model --- commercial or otherwise --- is currently out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are not intended for deployed use cases. Please refer to [the associated paper](https://arxiv.org/abs/2204.09817) for more details.
 ## Data
 This model builds upon existing publicly-available datasets:

 **Any** deployed use case of the model --- commercial or otherwise --- is currently out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are not intended for deployed use cases. Please refer to [the associated paper](https://arxiv.org/abs/2204.09817) for more details.
+### How to use
+Here is how to use this model to extract radiological sentence embeddings and obtain their cosine similarity in the joint space (image and text):
+```python
+# Load the model and tokenizer
+url = "microsoft/BiomedVLP-CXR-BERT-specialized"
+config = AutoConfig.from_pretrained(url, use_auth_token=True, trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained(url, use_auth_token=True, trust_remote_code=True)
+model = AutoModel.from_pretrained(url, config=config, use_auth_token=True, trust_remote_code=True)
+# Input text prompts (e.g., reference, synonym, contradiction)
+text_prompts = ["There is no pneumothorax or pleural effusion",
+                "No pleural effusion or pneumothorax is seen",
+                "The extent of the pleural effusion is constant."]
+# Tokenize and compute the sentence embeddings
+tokenizer_output = tokenizer.batch_encode_plus(batch_text_or_text_pairs=text_prompts,
+                                               add_special_tokens=True,
+                                               padding='longest',
+                                               return_tensors='pt')
+embeddings = model.get_projected_text_embeddings(input_ids=tokenizer_output.input_ids,
+                                                 attention_mask=tokenizer_output.attention_mask)
+# Compute 3x3 cosine similarity of sentence embeddings obtained from input text prompts.
+sim = torch.mm(embeddings, embeddings.t())
+```
 ## Data
 This model builds upon existing publicly-available datasets: