Ozan Oktay
commited on
Commit
•
278b9b5
1
Parent(s):
61277cc
Update README.md
Browse files
README.md
CHANGED
@@ -52,6 +52,34 @@ The primary intended use is to support AI researchers building on top of this wo
|
|
52 |
|
53 |
**Any** deployed use case of the model --- commercial or otherwise --- is currently out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are not intended for deployed use cases. Please refer to [the associated paper](https://arxiv.org/abs/2204.09817) for more details.
|
54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
## Data
|
56 |
|
57 |
This model builds upon existing publicly-available datasets:
|
|
|
52 |
|
53 |
**Any** deployed use case of the model --- commercial or otherwise --- is currently out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are not intended for deployed use cases. Please refer to [the associated paper](https://arxiv.org/abs/2204.09817) for more details.
|
54 |
|
55 |
+
### How to use
|
56 |
+
|
57 |
+
Here is how to use this model to extract radiological sentence embeddings and obtain their cosine similarity in the joint space (image and text):
|
58 |
+
|
59 |
+
```python
|
60 |
+
# Load the model and tokenizer
|
61 |
+
url = "microsoft/BiomedVLP-CXR-BERT-specialized"
|
62 |
+
config = AutoConfig.from_pretrained(url, use_auth_token=True, trust_remote_code=True)
|
63 |
+
tokenizer = AutoTokenizer.from_pretrained(url, use_auth_token=True, trust_remote_code=True)
|
64 |
+
model = AutoModel.from_pretrained(url, config=config, use_auth_token=True, trust_remote_code=True)
|
65 |
+
|
66 |
+
# Input text prompts (e.g., reference, synonym, contradiction)
|
67 |
+
text_prompts = ["There is no pneumothorax or pleural effusion",
|
68 |
+
"No pleural effusion or pneumothorax is seen",
|
69 |
+
"The extent of the pleural effusion is constant."]
|
70 |
+
|
71 |
+
# Tokenize and compute the sentence embeddings
|
72 |
+
tokenizer_output = tokenizer.batch_encode_plus(batch_text_or_text_pairs=text_prompts,
|
73 |
+
add_special_tokens=True,
|
74 |
+
padding='longest',
|
75 |
+
return_tensors='pt')
|
76 |
+
embeddings = model.get_projected_text_embeddings(input_ids=tokenizer_output.input_ids,
|
77 |
+
attention_mask=tokenizer_output.attention_mask)
|
78 |
+
|
79 |
+
# Compute 3x3 cosine similarity of sentence embeddings obtained from input text prompts.
|
80 |
+
sim = torch.mm(embeddings, embeddings.t())
|
81 |
+
```
|
82 |
+
|
83 |
## Data
|
84 |
|
85 |
This model builds upon existing publicly-available datasets:
|