xinxin66 commited on
Commit
9b83b80
Β·
verified Β·
1 Parent(s): 688ce0c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -0
README.md CHANGED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🌟 Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation
2
+ # NeurIPS 2025
3
+ > [Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation](https://arxiv.org/pdf/2505.14705?).<br>
4
+ > [Xin Zhang](https://zhangxin-xd.github.io/), Ziruo Zhang, [Jiawei Du](https://scholar.google.com/citations?user=WrJKEzEAAAAJ&hl=zh-CN), [Zuozhu Liu](https://person.zju.edu.cn/en/lzz), [Joey Tianyi Zhou](https://joeyzhouty.github.io/) <br>
5
+ > Agency for Science, Technology, and Research (ASTAR), Singapore <br>
6
+ > National University of Singapore, Singapore <br>
7
+ > Zhejiang University, China <br>
8
+ ## πŸ“– Introduction
9
+ <p align="center">
10
+ <img src="imgs/problem.png" alt="problem" title="problem" width="700">
11
+ </p>
12
+
13
+ <p align="justify">
14
+ <strong> Multimodal embedding distributions across various distillation methods </strong>:
15
+ We extract image and text embeddings from a finetuned CLIP and project them into a shared representation space using DOSNES.
16
+ Red triangles and blue circles denote image and text embeddings, respectively.
17
+ Left: Embeddings from randomly sampled data in the original dataset exhibit a well-spread and modality-aligned distribution.
18
+ Middle: The distilled dataset generated by a sota MDD method (LoRS) leads to Modality Collapse, where image and text embeddings are poorly aligned and concentrated in distinct regions.
19
+ Right: Our method effectively mitigates modality collapse, yielding a distribution that better preserves cross-modal alignment and exhibits greater representational diversity.
20
+ </p>
21
+
22
+ ## βš™οΈ Installation
23
+
24
+ To get started, follow these instructions to set up the environment and install dependencies.
25
+
26
+ 1. **Clone this repository**:
27
+ ```bash
28
+ git clone https://github.com/zhangxin-xd/RepBlend.git
29
+ cd RepBlend
30
+ ```
31
+
32
+ 2. **Install required packages**:
33
+ ```
34
+ conda create -n RepBlend python=3.10
35
+ conda activate RepBlend
36
+ pip install -r requirements.txt
37
+ ```
38
+ ---
39
+
40
+ ## πŸš€ Usage
41
+
42
+ Here’s how to use RepBlend for Multimodal Dataset Distillation:
43
+
44
+ First, download the pretrained weights and datasets and place them into their respective folders.
45
+ ### Pretrained Weights
46
+ The checkpoints for all experimental networks are available from their respective official repositories. For convenience, we have also provided them together [πŸ€— here](https://huggingface.co/xinxin66/RepBlend).
47
+ Once downloaded, put them in `distill_utils/checkpoints/`.
48
+
49
+ ### Experimental Datasets
50
+ The dataset hase been validated on various benchmarks, you can download from their respective links. Once downloaded, put them in `distill_utils/data/`.
51
+ | datasets | links|
52
+ |-----|-----|
53
+ | Flickr30K | [images](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset), [πŸ€— annotations](https://huggingface.co/xinxin66/RepBlend/)|
54
+ | COCO | [images](https://cocodataset.org/#download), [πŸ€— annotations](https://huggingface.co/xinxin66/RepBlend) |
55
+ |LLaVA-cc3m|[images](https://github.com/haotian-liu/LLaVA/blob/main/docs/Data.md), [πŸ€— annotations](https://huggingface.co/xinxin66/RepBlend)|
56
+
57
+ ### Generate Expert Trajectories
58
+ You can generate expert trajectories by running the `scripts/buffer.sh`, or alternatively, download our [pre-generated trajectories](πŸ€— https://huggingface.co/xinxin66/RepBlend) for faster reproduction.
59
+ ```
60
+ bash scripts/buffer.sh
61
+ ```
62
+ ### Distill Multimodal Dataset
63
+ You can distill multimodal datasets with RepBlend by running `scripts/distill_coco_repblend.sh` and `scripts/distill_flickr_repblend.sh`.
64
+ ```
65
+ bash scripts/distill_coco_repblend.sh
66
+ bash scripts/distill_flickr_repblend.sh
67
+ ```
68
+
69
+ ## πŸ“Š Results
70
+
71
+ Our experiments demonstrate the effectiveness of the proposed approach across various benchmarks.
72
+ <div style="display: flex; justify-content: center; align-items: center;">
73
+ <img src="imgs/results 1.png" alt="Results 1" width="800"/>
74
+ </div>
75
+ <br>
76
+ <div style="display: flex; justify-content: center; align-items: center;">
77
+ <img src="imgs/table 1.png" alt="table 1" width="400"/>
78
+ <img src="imgs/table 2.png" alt="table 2" width="400"/>
79
+ </div>
80
+
81
+ For detailed experimental results and further analysis, please refer to the full paper.
82
+
83
+ ---
84
+
85
+ ## πŸ“‘ Citation
86
+
87
+ If you find this code useful in your research, please consider citing our work:
88
+
89
+ ```bibtex
90
+ @inproceedings{RepBlend2025neurips,
91
+ title={Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation},
92
+ author={Zhang, Xin and Zhang, Ziruo, and Du, Jiawei and Liu, Zuozhu and Zhou, Joey Tianyi},
93
+ booktitle={Adv. Neural Inf. Process. Syst. (NeurIPS)},
94
+ year={2025}
95
+ }
96
+ ```
97
+ ---
98
+ ## πŸŽ‰ Reference
99
+ Our code has referred to previous works:
100
+ - [LoRS: Low-Rank Similarity Mining](https://github.com/silicx/LoRS_Distill)
101
+ - [Vision-Language Dataset Distillation](https://github.com/princetonvisualai/multimodal_dataset_distillation)
102
+ - [Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory (TESLA)](https://github.com/justincui03/tesla)
103
+