GoodBaiBai88
/

M3D-CLIP

Image Feature Extraction

feature-extraction

3D medical CLIP

Image-text retrieval

Model card Files Files and versions Community

GoodBaiBai88 commited on Apr 29

Commit

9903146

•

1 Parent(s): 51f3fd7

Update README.md

Files changed (1) hide show

README.md +9 -3

README.md CHANGED Viewed

@@ -8,9 +8,15 @@ tags:
 - Image-text retrieval
 ---
-M3D-CLIP is a 3D medical CLIP model, which aligns vision and language through contrastive loss on [M3D-Cap](https://huggingface.co/datasets/GoodBaiBai88/M3D-Cap) dataset.
-The vision encoder uses 3D ViT with 32*256*256 image size and 4*16*16 patch size.
-The text encoder utilizes a pre-trained BERT as initialization.
 ![comparison](M3D_CLIP_table.png)
 ![comparison](itr_result.png)

 - Image-text retrieval
 ---
+M3D-CLIP is one of the works in the [M3D](https://github.com/BAAI-DCAI/M3D) series.
+It is a 3D medical CLIP model that aligns vision and language through contrastive loss on the [M3D-Cap](https://huggingface.co/datasets/GoodBaiBai88/M3D-Cap) dataset.
+The vision encoder uses 3D ViT with 32\*256\*256 image size and 4\*16\*16 patch size.
+The language encoder utilizes a pre-trained BERT as initialization.
+The uses of M3D-CLIP:
+1. 3D medical image and text retrieval task.
+2. Aligned and powerful image and text features for downstream tasks.
+3. Text-aligned visual encoders are excellent pre-trained models for visual and multi-modal tasks.
 ![comparison](M3D_CLIP_table.png)
 ![comparison](itr_result.png)