Ozan Oktay commited on
Commit
278b9b5
1 Parent(s): 61277cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -52,6 +52,34 @@ The primary intended use is to support AI researchers building on top of this wo
52
 
53
  **Any** deployed use case of the model --- commercial or otherwise --- is currently out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are not intended for deployed use cases. Please refer to [the associated paper](https://arxiv.org/abs/2204.09817) for more details.
54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  ## Data
56
 
57
  This model builds upon existing publicly-available datasets:
52
 
53
  **Any** deployed use case of the model --- commercial or otherwise --- is currently out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are not intended for deployed use cases. Please refer to [the associated paper](https://arxiv.org/abs/2204.09817) for more details.
54
 
55
+ ### How to use
56
+
57
+ Here is how to use this model to extract radiological sentence embeddings and obtain their cosine similarity in the joint space (image and text):
58
+
59
+ ```python
60
+ # Load the model and tokenizer
61
+ url = "microsoft/BiomedVLP-CXR-BERT-specialized"
62
+ config = AutoConfig.from_pretrained(url, use_auth_token=True, trust_remote_code=True)
63
+ tokenizer = AutoTokenizer.from_pretrained(url, use_auth_token=True, trust_remote_code=True)
64
+ model = AutoModel.from_pretrained(url, config=config, use_auth_token=True, trust_remote_code=True)
65
+
66
+ # Input text prompts (e.g., reference, synonym, contradiction)
67
+ text_prompts = ["There is no pneumothorax or pleural effusion",
68
+ "No pleural effusion or pneumothorax is seen",
69
+ "The extent of the pleural effusion is constant."]
70
+
71
+ # Tokenize and compute the sentence embeddings
72
+ tokenizer_output = tokenizer.batch_encode_plus(batch_text_or_text_pairs=text_prompts,
73
+ add_special_tokens=True,
74
+ padding='longest',
75
+ return_tensors='pt')
76
+ embeddings = model.get_projected_text_embeddings(input_ids=tokenizer_output.input_ids,
77
+ attention_mask=tokenizer_output.attention_mask)
78
+
79
+ # Compute 3x3 cosine similarity of sentence embeddings obtained from input text prompts.
80
+ sim = torch.mm(embeddings, embeddings.t())
81
+ ```
82
+
83
  ## Data
84
 
85
  This model builds upon existing publicly-available datasets: