ozanoktay commited on
Commit
fc335c4
1 Parent(s): c4dd911

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -107,7 +107,10 @@ These datasets reflect a broad variety of sources ranging from biomedical abstra
107
 
108
  ## Performance
109
 
110
- The presented model achieves state-of-the-art results in radiology natural language inference by leveraging semantics and discourse characteristics at training time more efficiently. The experiments were performed on the RadNLI and MS-CXR-T benchmarks, which measure the quality of text embeddings in terms of static and temporal semantics respectively. BioViL-T is benchmarked against other commonly used language models, including [PubMedBERT](https://aka.ms/pubmedbert) and [CXR-BERT](https://aka.ms/biovil).
 
 
 
111
 
112
  | | MS-CXR-T | MS-CXR-T | RadNLI (2 classes) | RadNLI (2 classes) |
113
  | ----------------------------------------------- | :-------------------------------: | :----------------------: | :-------------------------: | :-------------: |
@@ -116,7 +119,6 @@ The presented model achieves state-of-the-art results in radiology natural langu
116
  | [CXR-BERT-General](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general) | 62.60 | .601 | 87.59 | .902 |
117
  | [CXR-BERT-Specialized]((https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-specialized)) | 78.12 | .837 | 89.66 | .932 |
118
  | **BioViL-T** | **87.77** | **.933** | **90.52** | **.947** |
119
- <br/>
120
 
121
  The novel pretraining framework yields also better vision-language representations. Below is the zero-shot phrase grounding performance obtained on the [MS-CXR](https://physionet.org/content/ms-cxr/0.1/) benchmark dataset, which evaluates the quality of image-text latent representations.
122
 
@@ -125,9 +127,8 @@ The novel pretraining framework yields also better vision-language representatio
125
  | BioViL | 1.07 +- 0.04 | 0.229 +- 0.005 |
126
  | BioViL-L | 1.21 +- 0.05 | 0.202 +- 0.010 |
127
  | **BioViL-T** | **1.33 +- 0.04** | **0.240 +- 0.005** |
128
- <br/>
129
 
130
- Additional experimental results and discussion can be found in the corresponding paper, [Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing](https://arxiv.org/abs/2301.04558).
131
 
132
 
133
  ## Limitations
107
 
108
  ## Performance
109
 
110
+ The presented model achieves state-of-the-art results in radiology natural language inference by leveraging semantics and discourse characteristics at training time more efficiently.
111
+ The experiments were performed on the RadNLI and MS-CXR-T benchmarks, which measure the quality of text embeddings in terms of static and temporal semantics respectively.
112
+ BioViL-T is benchmarked against other commonly used SOTA domain specific BERT models, including [PubMedBERT](https://aka.ms/pubmedbert) and [CXR-BERT](https://aka.ms/biovil).
113
+ The results below show that BioViL-T has increased sensitivity of sentence embeddings to temporal content (MS-CXR-T) whilst better capturing the static content (RadNLI).
114
 
115
  | | MS-CXR-T | MS-CXR-T | RadNLI (2 classes) | RadNLI (2 classes) |
116
  | ----------------------------------------------- | :-------------------------------: | :----------------------: | :-------------------------: | :-------------: |
119
  | [CXR-BERT-General](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general) | 62.60 | .601 | 87.59 | .902 |
120
  | [CXR-BERT-Specialized]((https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-specialized)) | 78.12 | .837 | 89.66 | .932 |
121
  | **BioViL-T** | **87.77** | **.933** | **90.52** | **.947** |
 
122
 
123
  The novel pretraining framework yields also better vision-language representations. Below is the zero-shot phrase grounding performance obtained on the [MS-CXR](https://physionet.org/content/ms-cxr/0.1/) benchmark dataset, which evaluates the quality of image-text latent representations.
124
 
127
  | BioViL | 1.07 +- 0.04 | 0.229 +- 0.005 |
128
  | BioViL-L | 1.21 +- 0.05 | 0.202 +- 0.010 |
129
  | **BioViL-T** | **1.33 +- 0.04** | **0.240 +- 0.005** |
 
130
 
131
+ Additional experimental results and discussion can be found in the corresponding paper, ["Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing", CVPR'23](https://arxiv.org/abs/2301.04558).
132
 
133
 
134
  ## Limitations