GoodBaiBai88 commited on
Commit
9903146
1 Parent(s): 51f3fd7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -8,9 +8,15 @@ tags:
8
  - Image-text retrieval
9
  ---
10
 
11
- M3D-CLIP is a 3D medical CLIP model, which aligns vision and language through contrastive loss on [M3D-Cap](https://huggingface.co/datasets/GoodBaiBai88/M3D-Cap) dataset.
12
- The vision encoder uses 3D ViT with 32*256*256 image size and 4*16*16 patch size.
13
- The text encoder utilizes a pre-trained BERT as initialization.
 
 
 
 
 
 
14
 
15
  ![comparison](M3D_CLIP_table.png)
16
  ![comparison](itr_result.png)
 
8
  - Image-text retrieval
9
  ---
10
 
11
+ M3D-CLIP is one of the works in the [M3D](https://github.com/BAAI-DCAI/M3D) series.
12
+ It is a 3D medical CLIP model that aligns vision and language through contrastive loss on the [M3D-Cap](https://huggingface.co/datasets/GoodBaiBai88/M3D-Cap) dataset.
13
+ The vision encoder uses 3D ViT with 32\*256\*256 image size and 4\*16\*16 patch size.
14
+ The language encoder utilizes a pre-trained BERT as initialization.
15
+ The uses of M3D-CLIP:
16
+ 1. 3D medical image and text retrieval task.
17
+ 2. Aligned and powerful image and text features for downstream tasks.
18
+ 3. Text-aligned visual encoders are excellent pre-trained models for visual and multi-modal tasks.
19
+
20
 
21
  ![comparison](M3D_CLIP_table.png)
22
  ![comparison](itr_result.png)