microsoft
/

BiomedVLP-CXR-BERT-specialized

Fill-Mask Transformers PyTorch English cxr-bert feature-extraction exbert custom_code

Model card Files Files and versions Community

Ozan Oktay commited on Jul 11, 2022

Commit

896d61e

•

1 Parent(s): 6fcad0b

Update Readme -- Add information about BioViL Resnet50.

Browse files

Files changed (1) hide show

README.md +12 -9

README.md CHANGED Viewed

@@ -27,16 +27,22 @@ First, we pretrain [**CXR-BERT-general**](https://huggingface.co/microsoft/Biome
 | CXR-BERT-general                                  | [microsoft/BiomedVLP-CXR-BERT-general](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general)         | PubMed & MIMIC | Pretrained for biomedical literature and clinical domains |
 | CXR-BERT-specialized (after multi-modal training) | [microsoft/BiomedVLP-CXR-BERT-specialized](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-specialized) | PubMed & MIMIC | Pretrained for chest X-ray domain                         |
 ## Citation
-```
 @misc{https://doi.org/10.48550/arxiv.2204.09817,
-  title = {Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing},
   author = {Boecking, Benedikt and Usuyama, Naoto and Bannur, Shruthi and Castro, Daniel C. and Schwaighofer, Anton and Hyland, Stephanie and Wetscherek, Maria and Naumann, Tristan and Nori, Aditya and Alvarez-Valle, Javier and Poon, Hoifung and Oktay, Ozan},
   publisher = {arXiv},
   year = {2022},
-  url = {https://arxiv.org/abs/2204.09817},
-  doi = {10.48550/ARXIV.2204.09817},
 }
 ```
@@ -127,9 +133,6 @@ This model was developed using English corpora, and thus can be considered Engli
 ## Further information
-Please refer to the corresponding paper, [Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing](https://arxiv.org/abs/2204.09817) for additional details on the model training and evaluation.
-For additional inference pipelines with CXR-BERT, please refer to the [HI-ML-Multimodal GitHub](https://hi-ml.readthedocs.io/en/latest/multimodal.html) repository. The associated source files will soon be accessible through this link.

 | CXR-BERT-general                                  | [microsoft/BiomedVLP-CXR-BERT-general](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general)         | PubMed & MIMIC | Pretrained for biomedical literature and clinical domains |
 | CXR-BERT-specialized (after multi-modal training) | [microsoft/BiomedVLP-CXR-BERT-specialized](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-specialized) | PubMed & MIMIC | Pretrained for chest X-ray domain                         |
+## Image model
+**CXR-BERT-specialized** is jointly trained with a ResNet-50 image model in a multi-modal contrastive learning framework. Prior to multi-modal learning, the image model is pre-trained on the same set of images in MIMIC-CXR using [SimCLR](https://arxiv.org/abs/2002.05709). The corresponding model definition and its loading functions can be accessed through our [HI-ML-Multimodal](https://github.com/microsoft/hi-ml/blob/main/hi-ml-multimodal/src/health_multimodal/image/model/model.py) GitHub repository. The joint image and text model, namely [BioViL](https://arxiv.org/abs/2204.09817), can be used in phrase grounding applications as shown in this python notebook [example](https://mybinder.org/v2/gh/microsoft/hi-ml/HEAD?labpath=hi-ml-multimodal%2Fnotebooks%2Fphrase_grounding.ipynb). Additionally, please check the [MS-CXR benchmark](https://physionet.org/content/ms-cxr/0.1/) for a more systematic evaluation of joint image and text models in phrase grounding tasks.
 ## Citation
+The corresponding manuscript is accepted to be presented at the [**European Conference on Computer Vision (ECCV) 2022.**](https://eccv2022.ecva.net/)
+```bibtex
 @misc{https://doi.org/10.48550/arxiv.2204.09817,
+  doi = {10.48550/ARXIV.2204.09817},
+  url = {https://arxiv.org/abs/2204.09817},
   author = {Boecking, Benedikt and Usuyama, Naoto and Bannur, Shruthi and Castro, Daniel C. and Schwaighofer, Anton and Hyland, Stephanie and Wetscherek, Maria and Naumann, Tristan and Nori, Aditya and Alvarez-Valle, Javier and Poon, Hoifung and Oktay, Ozan},
+  title = {Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing},
   publisher = {arXiv},
   year = {2022},
 }
 ```
 ## Further information
+Please refer to the corresponding paper, ["Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing", ECCV'22](https://arxiv.org/abs/2204.09817) for additional details on the model training and evaluation.
+For additional inference pipelines with CXR-BERT, please refer to the [HI-ML-Multimodal GitHub](https://hi-ml.readthedocs.io/en/latest/multimodal.html) repository.