Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,146 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
tags:
|
4 |
+
- brain decoding
|
5 |
+
- multimodal-large-language-models
|
6 |
+
- 'brain-mri '
|
7 |
+
- neuroimaging
|
8 |
+
datasets:
|
9 |
+
- weihaox/umbrae
|
10 |
+
---
|
11 |
+
|
12 |
+
<h2>UMBRAE: Unified Multimodal Brain Decoding (ECCV 2024)</h2>
|
13 |
+
|
14 |
+
<div>
|
15 |
+
<a href='https://weihaox.github.io/' target='_blank'>Weihao Xia</a><sup>1</sup> 
|
16 |
+
<a href='https://team.inria.fr/rits/membres/raoul-de-charette/' target='_blank'>Raoul de Charette</a><sup>2</sup> 
|
17 |
+
<a href='https://www.cl.cam.ac.uk/~aco41/' target='_blank'>Cengiz Öztireli</a><sup>3</sup> 
|
18 |
+
<a href='http://www.homepages.ucl.ac.uk/~ucakjxu/' target='_blank'>Jing-Hao Xue</a><sup>1</sup> 
|
19 |
+
</div>
|
20 |
+
<div>
|
21 |
+
<sup>1</sup>University College London 
|
22 |
+
<sup>2</sup>Inria 
|
23 |
+
<sup>3</sup>University of Cambridge 
|
24 |
+
</div>
|
25 |
+
</div>
|
26 |
+
|
27 |
+
[🔋Online Demo](https://colab.research.google.com/drive/1VKd1gAB-6AIdMzBCG0J-U7h9vwsiKnHp) | [🌟GitHub](https://github.com/weihaox/UMBRAE) | [📜Paper](https://huggingface.co/papers/2404.07202)</a>
|
28 |
+
|
29 |
+
|
30 |
+
<p>UMBRAE decodes multimodal explanations from brain signals. (1) We introduce a <b>universal brain encoder</b> for multimodal-brain alignment and recover conceptual and spatial details by using multimodal large language models. (2) We introduce <b>cross-subject training</b> to overcome unique brain patterns of different individuals. This allows brain signals from multiple subjects to be trained within the same model. (3) Our method supports <b>weakly-supervised subject adaptation</b>, enabling the training of a model for a new subject in a data-efficient manner. (4) For evaluation, we introduce <b>BrainHub</b>, a brain understanding benchmark, based on NSD and COCO.
|
31 |
+
|
32 |
+
## Installation
|
33 |
+
|
34 |
+
### Environment
|
35 |
+
|
36 |
+
```bash
|
37 |
+
conda create -n brainx python=3.10
|
38 |
+
conda activate brainx
|
39 |
+
pip install -r requirements.txt
|
40 |
+
```
|
41 |
+
|
42 |
+
### Download Data and Checkpoints
|
43 |
+
|
44 |
+
The training and inference scripts support automatically downloading the dataset if the designated path is empty. However, this process can be quite slow. You can try the following script to download all data in advance if this happens. Please fill out the NSD [Data Access form](https://forms.gle/xue2bCdM9LaFNMeb7) and agree to the [Terms and Conditions](https://cvnlab.slite.page/p/IB6BSeW_7o/Terms-and-Conditions).
|
45 |
+
|
46 |
+
Download Checkpoints from [Hugging Face](https://huggingface.co/datasets/weihaox/brainx).
|
47 |
+
|
48 |
+
```bash
|
49 |
+
bash download_data.sh
|
50 |
+
bash download_checkpoint.sh
|
51 |
+
```
|
52 |
+
|
53 |
+
## Inference
|
54 |
+
|
55 |
+
Our method inherits multimodal understanding capabilities of MLLMs, enabling the switch between different tasks through different prompts. You can either use the prompts listed in our paper or create customised instructions according to actual needs. Please specify brainx-v-1-4 or brainx.
|
56 |
+
|
57 |
+
```bash
|
58 |
+
exp='brainx-v-1-4' # 'brainx'
|
59 |
+
|
60 |
+
prompt_caption='Describe this image <image> as simply as possible.'
|
61 |
+
|
62 |
+
for sub in 1 2 5 7
|
63 |
+
do
|
64 |
+
python inference.py --data_path 'nsd_data' --fmri_encoder 'brainx' --subj $sub \
|
65 |
+
--prompt "$prompt_caption" --brainx_path "train_logs/${exp}/last.pth" \
|
66 |
+
--save_path "evaluation/eval_caption/${exp}"
|
67 |
+
done
|
68 |
+
```
|
69 |
+
|
70 |
+
Given that identified classes might be named differently, or simply absent from ground truth labels, we evaluate bounding boxes through REC. We use prompt `"Locate <expr> in <image> and provide its coordinates, please"`, but others like `"Can you point out <expr> in the image and provide the bounding boxes of its location?"` shall also work.
|
71 |
+
|
72 |
+
```bash
|
73 |
+
for sub in 1 2 5 7
|
74 |
+
do
|
75 |
+
python inference_rec.py --data_path 'nsd_data' --fmri_encoder 'brainx' \
|
76 |
+
--subj $sub --brainx_path "train_logs/${exp}/last.pth" \
|
77 |
+
--save_path "evaluation/eval_bbox_rec/${exp}/sub0${sub}_dim1024"
|
78 |
+
done
|
79 |
+
```
|
80 |
+
|
81 |
+
## Training
|
82 |
+
|
83 |
+
### Single-Subject Training
|
84 |
+
|
85 |
+
```bash
|
86 |
+
accelerate launch --num_processes=1 --num_machines=1 --gpu_ids='0' train.py \
|
87 |
+
--data_path 'nsd_data' --fmri_encoder 'brainxs' --subj 1 \
|
88 |
+
--model_save_path 'train_logs/demo_single_subject/sub01_dim1024'
|
89 |
+
```
|
90 |
+
|
91 |
+
### Cross-Subject Training
|
92 |
+
|
93 |
+
```bash
|
94 |
+
accelerate launch --num_processes=1 --num_machines=1 --gpu_ids='0' train_brainx.py \
|
95 |
+
--data_path 'nsd_data' --fmri_encoder 'brainx' --batch_size 128 --num_epochs 300 \
|
96 |
+
--model_save_path 'train_logs/demo_cross_subject' --subj 1 2 5 7
|
97 |
+
```
|
98 |
+
|
99 |
+
### Weakly-Supervised Subject Adaptation
|
100 |
+
|
101 |
+
If you would like to adapt to a new subject, for example, S7, first train a model with other available subjects (S1, S2, S5) using the above cross-subject training. Then train the new subject using the following command.
|
102 |
+
|
103 |
+
```bash
|
104 |
+
sub=7
|
105 |
+
data_ratio=1.0
|
106 |
+
accelerate launch --num_processes=1 --num_machines=1 --gpu_ids='0' train_brainx_adaptation.py \
|
107 |
+
--data_path 'nsd_data' --fmri_encoder 'brainxc' --batch_size 128 --num_epochs 240 \
|
108 |
+
--subj $sub --data_ratio $data_ratio \
|
109 |
+
--encoder_path 'train_logs/demo_cross_subject/brainx_adaptation_125/last.pth' \
|
110 |
+
--model_save_path "train_logs/demo_weak_adaptation/brainx_adaptation_${sub}_${data_ratio}"
|
111 |
+
```
|
112 |
+
|
113 |
+
## Evaluation
|
114 |
+
|
115 |
+
The benchmark, including groundtruth data, evaluation scripts, and baseline results, is in [brainhub](https://github.com/weihaox/BrainHub).
|
116 |
+
|
117 |
+
1. Download `brainhub` to the root path: `git clone https://github.com/weihaox/BrainHub`
|
118 |
+
2. Process groundtruth test images: `python processing/decode_images.py`
|
119 |
+
3. Run evaluation for brain captioning and grounding:
|
120 |
+
|
121 |
+
```bash
|
122 |
+
cd BrainHub
|
123 |
+
for sub in 1 2 5 7
|
124 |
+
do
|
125 |
+
python eval_caption.py ../umbrae/evaluation/eval_caption/${exp}/sub0${sub}_dim1024/fmricap.json \
|
126 |
+
caption/images --references_json caption/fmri_cococap.json
|
127 |
+
python eval_bbox_rec.py --path_out "../umbrae/evaluation/eval_bbox_rec/${exp}/sub0${sub}_dim1024"
|
128 |
+
done
|
129 |
+
```
|
130 |
+
|
131 |
+
We also provide baseline results associated with [BrainHub](https://github.com/weihaox/BrainHub/tree/main/caption/comparison), including the captioning results from [SDRecon](https://github.com/yu-takagi/StableDiffusionReconstruction), [BrainCap](https://arxiv.org/abs/2305.11560), and [OneLLM](https://onellm.csuhan.com/), as well as the captioning and grounding results from [UMBRAE](https://weihaox.github.io/UMBRAE/).
|
132 |
+
|
133 |
+
## Acknowledgements
|
134 |
+
|
135 |
+
We thank the authors of [SDRecon](https://github.com/yu-takagi/StableDiffusionReconstruction), [BrainCap](https://arxiv.org/abs/2305.11560), and [OneLLM](https://onellm.csuhan.com/) for providing the codes or the results. We also express gratitude for [NSD](https://naturalscenesdataset.org/) and [COCO](https://cocodataset.org/#home), which were used to construct our brainhub. The training script is based on [MindEye](https://medarc-ai.github.io/mindeye/). We utilize the pretrained models [Shikra](https://github.com/shikras/shikra) and [LLaVA](https://llava-vl.github.io/) as the MLLMs. Thanks for the awesome research works.
|
136 |
+
|
137 |
+
## Citation
|
138 |
+
|
139 |
+
```bibtex
|
140 |
+
@inproceedings{xia2024umbrae,
|
141 |
+
author = {Xia, Weihao and de Charette, Raoul and Öztireli, Cengiz and Xue, Jing-Hao},
|
142 |
+
title = {UMBRAE: Unified Multimodal Brain Decoding},
|
143 |
+
booktitle = {European Conference on Computer Vision (ECCV)},
|
144 |
+
year = {2024},
|
145 |
+
}
|
146 |
+
```
|