weihaox commited on
Commit
9925ed6
·
verified ·
1 Parent(s): 9a3dfa2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -3
README.md CHANGED
@@ -1,3 +1,146 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - brain decoding
5
+ - multimodal-large-language-models
6
+ - 'brain-mri '
7
+ - neuroimaging
8
+ datasets:
9
+ - weihaox/umbrae
10
+ ---
11
+
12
+ <h2>UMBRAE: Unified Multimodal Brain Decoding (ECCV 2024)</h2>
13
+
14
+ <div>
15
+ <a href='https://weihaox.github.io/' target='_blank'>Weihao Xia</a><sup>1</sup>&emsp;
16
+ <a href='https://team.inria.fr/rits/membres/raoul-de-charette/' target='_blank'>Raoul de Charette</a><sup>2</sup>&emsp;
17
+ <a href='https://www.cl.cam.ac.uk/~aco41/' target='_blank'>Cengiz Öztireli</a><sup>3</sup>&emsp;
18
+ <a href='http://www.homepages.ucl.ac.uk/~ucakjxu/' target='_blank'>Jing-Hao Xue</a><sup>1</sup>&emsp;
19
+ </div>
20
+ <div>
21
+ <sup>1</sup>University College London&emsp;
22
+ <sup>2</sup>Inria&emsp;
23
+ <sup>3</sup>University of Cambridge&emsp;
24
+ </div>
25
+ </div>
26
+
27
+ [🔋Online Demo](https://colab.research.google.com/drive/1VKd1gAB-6AIdMzBCG0J-U7h9vwsiKnHp) | [🌟GitHub](https://github.com/weihaox/UMBRAE) | [📜Paper](https://huggingface.co/papers/2404.07202)</a>
28
+
29
+
30
+ <p>UMBRAE decodes multimodal explanations from brain signals. (1) We introduce a <b>universal brain encoder</b> for multimodal-brain alignment and recover conceptual and spatial details by using multimodal large language models. (2) We introduce <b>cross-subject training</b> to overcome unique brain patterns of different individuals. This allows brain signals from multiple subjects to be trained within the same model. (3) Our method supports <b>weakly-supervised subject adaptation</b>, enabling the training of a model for a new subject in a data-efficient manner. (4) For evaluation, we introduce <b>BrainHub</b>, a brain understanding benchmark, based on NSD and COCO.
31
+
32
+ ## Installation
33
+
34
+ ### Environment
35
+
36
+ ```bash
37
+ conda create -n brainx python=3.10
38
+ conda activate brainx
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ ### Download Data and Checkpoints
43
+
44
+ The training and inference scripts support automatically downloading the dataset if the designated path is empty. However, this process can be quite slow. You can try the following script to download all data in advance if this happens. Please fill out the NSD [Data Access form](https://forms.gle/xue2bCdM9LaFNMeb7) and agree to the [Terms and Conditions](https://cvnlab.slite.page/p/IB6BSeW_7o/Terms-and-Conditions).
45
+
46
+ Download Checkpoints from [Hugging Face](https://huggingface.co/datasets/weihaox/brainx).
47
+
48
+ ```bash
49
+ bash download_data.sh
50
+ bash download_checkpoint.sh
51
+ ```
52
+
53
+ ## Inference
54
+
55
+ Our method inherits multimodal understanding capabilities of MLLMs, enabling the switch between different tasks through different prompts. You can either use the prompts listed in our paper or create customised instructions according to actual needs. Please specify brainx-v-1-4 or brainx.
56
+
57
+ ```bash
58
+ exp='brainx-v-1-4' # 'brainx'
59
+
60
+ prompt_caption='Describe this image <image> as simply as possible.'
61
+
62
+ for sub in 1 2 5 7
63
+ do
64
+ python inference.py --data_path 'nsd_data' --fmri_encoder 'brainx' --subj $sub \
65
+ --prompt "$prompt_caption" --brainx_path "train_logs/${exp}/last.pth" \
66
+ --save_path "evaluation/eval_caption/${exp}"
67
+ done
68
+ ```
69
+
70
+ Given that identified classes might be named differently, or simply absent from ground truth labels, we evaluate bounding boxes through REC. We use prompt `"Locate <expr> in <image> and provide its coordinates, please"`, but others like `"Can you point out <expr> in the image and provide the bounding boxes of its location?"` shall also work.
71
+
72
+ ```bash
73
+ for sub in 1 2 5 7
74
+ do
75
+ python inference_rec.py --data_path 'nsd_data' --fmri_encoder 'brainx' \
76
+ --subj $sub --brainx_path "train_logs/${exp}/last.pth" \
77
+ --save_path "evaluation/eval_bbox_rec/${exp}/sub0${sub}_dim1024"
78
+ done
79
+ ```
80
+
81
+ ## Training
82
+
83
+ ### Single-Subject Training
84
+
85
+ ```bash
86
+ accelerate launch --num_processes=1 --num_machines=1 --gpu_ids='0' train.py \
87
+ --data_path 'nsd_data' --fmri_encoder 'brainxs' --subj 1 \
88
+ --model_save_path 'train_logs/demo_single_subject/sub01_dim1024'
89
+ ```
90
+
91
+ ### Cross-Subject Training
92
+
93
+ ```bash
94
+ accelerate launch --num_processes=1 --num_machines=1 --gpu_ids='0' train_brainx.py \
95
+ --data_path 'nsd_data' --fmri_encoder 'brainx' --batch_size 128 --num_epochs 300 \
96
+ --model_save_path 'train_logs/demo_cross_subject' --subj 1 2 5 7
97
+ ```
98
+
99
+ ### Weakly-Supervised Subject Adaptation
100
+
101
+ If you would like to adapt to a new subject, for example, S7, first train a model with other available subjects (S1, S2, S5) using the above cross-subject training. Then train the new subject using the following command.
102
+
103
+ ```bash
104
+ sub=7
105
+ data_ratio=1.0
106
+ accelerate launch --num_processes=1 --num_machines=1 --gpu_ids='0' train_brainx_adaptation.py \
107
+ --data_path 'nsd_data' --fmri_encoder 'brainxc' --batch_size 128 --num_epochs 240 \
108
+ --subj $sub --data_ratio $data_ratio \
109
+ --encoder_path 'train_logs/demo_cross_subject/brainx_adaptation_125/last.pth' \
110
+ --model_save_path "train_logs/demo_weak_adaptation/brainx_adaptation_${sub}_${data_ratio}"
111
+ ```
112
+
113
+ ## Evaluation
114
+
115
+ The benchmark, including groundtruth data, evaluation scripts, and baseline results, is in [brainhub](https://github.com/weihaox/BrainHub).
116
+
117
+ 1. Download `brainhub` to the root path: `git clone https://github.com/weihaox/BrainHub`
118
+ 2. Process groundtruth test images: `python processing/decode_images.py`
119
+ 3. Run evaluation for brain captioning and grounding:
120
+
121
+ ```bash
122
+ cd BrainHub
123
+ for sub in 1 2 5 7
124
+ do
125
+ python eval_caption.py ../umbrae/evaluation/eval_caption/${exp}/sub0${sub}_dim1024/fmricap.json \
126
+ caption/images --references_json caption/fmri_cococap.json
127
+ python eval_bbox_rec.py --path_out "../umbrae/evaluation/eval_bbox_rec/${exp}/sub0${sub}_dim1024"
128
+ done
129
+ ```
130
+
131
+ We also provide baseline results associated with [BrainHub](https://github.com/weihaox/BrainHub/tree/main/caption/comparison), including the captioning results from [SDRecon](https://github.com/yu-takagi/StableDiffusionReconstruction), [BrainCap](https://arxiv.org/abs/2305.11560), and [OneLLM](https://onellm.csuhan.com/), as well as the captioning and grounding results from [UMBRAE](https://weihaox.github.io/UMBRAE/).
132
+
133
+ ## Acknowledgements
134
+
135
+ We thank the authors of [SDRecon](https://github.com/yu-takagi/StableDiffusionReconstruction), [BrainCap](https://arxiv.org/abs/2305.11560), and [OneLLM](https://onellm.csuhan.com/) for providing the codes or the results. We also express gratitude for [NSD](https://naturalscenesdataset.org/) and [COCO](https://cocodataset.org/#home), which were used to construct our brainhub. The training script is based on [MindEye](https://medarc-ai.github.io/mindeye/). We utilize the pretrained models [Shikra](https://github.com/shikras/shikra) and [LLaVA](https://llava-vl.github.io/) as the MLLMs. Thanks for the awesome research works.
136
+
137
+ ## Citation
138
+
139
+ ```bibtex
140
+ @inproceedings{xia2024umbrae,
141
+ author = {Xia, Weihao and de Charette, Raoul and Öztireli, Cengiz and Xue, Jing-Hao},
142
+ title = {UMBRAE: Unified Multimodal Brain Decoding},
143
+ booktitle = {European Conference on Computer Vision (ECCV)},
144
+ year = {2024},
145
+ }
146
+ ```