Commit
•
9903146
1
Parent(s):
51f3fd7
Update README.md
Browse files
README.md
CHANGED
@@ -8,9 +8,15 @@ tags:
|
|
8 |
- Image-text retrieval
|
9 |
---
|
10 |
|
11 |
-
M3D-CLIP is
|
12 |
-
|
13 |
-
The
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
![comparison](M3D_CLIP_table.png)
|
16 |
![comparison](itr_result.png)
|
|
|
8 |
- Image-text retrieval
|
9 |
---
|
10 |
|
11 |
+
M3D-CLIP is one of the works in the [M3D](https://github.com/BAAI-DCAI/M3D) series.
|
12 |
+
It is a 3D medical CLIP model that aligns vision and language through contrastive loss on the [M3D-Cap](https://huggingface.co/datasets/GoodBaiBai88/M3D-Cap) dataset.
|
13 |
+
The vision encoder uses 3D ViT with 32\*256\*256 image size and 4\*16\*16 patch size.
|
14 |
+
The language encoder utilizes a pre-trained BERT as initialization.
|
15 |
+
The uses of M3D-CLIP:
|
16 |
+
1. 3D medical image and text retrieval task.
|
17 |
+
2. Aligned and powerful image and text features for downstream tasks.
|
18 |
+
3. Text-aligned visual encoders are excellent pre-trained models for visual and multi-modal tasks.
|
19 |
+
|
20 |
|
21 |
![comparison](M3D_CLIP_table.png)
|
22 |
![comparison](itr_result.png)
|