MIRAGE: Adaptive Multimodal Gating for Whole-Brain fMRI Encoding
MIRAGE (Multimodal Integration with Representation-Adaptive Gated Encoding) is a brain encoder trained on the Algonauts 2025 challenge dataset. It takes multimodal hidden-state features extracted from Qwen3-Omni-30B-A3B-Thinking and predicts BOLD fMRI responses in 1,000 cortical parcels at 100 TRs per window.
Model Description
Video / Audio / Transcript
-> Qwen3-Omni-30B-A3B-Thinking hidden states
-> per-modality layer pooler (24 learned queries)
-> linear projectors
-> temporal Transformer (8 layers, 8 heads, hidden dim 3072)
-> subject_linear readout
-> 100 TRs x 1,000 parcels
Training
| Hyperparameter | Value |
|---|---|
| Dataset | Algonauts 2025 (Friends TV + Movie10) |
| Subjects | sub-01, sub-02, sub-03, sub-05 |
| Val split | Friends season 6 hold-out |
| Epochs | 15 |
| Batch size | 16 |
| Optimizer | AdamW (lr=0.0001) |
| LR schedule | OneCycleLR |
| Mixed precision | 16-mixed |
Evaluation
MIRAGE results on the Algonauts 2025 CNeuroMod splits. Values are mean Pearson r across the four trained subjects. Friends s06 is the held-out validation split used during development; Friends s07 is the held-out in-distribution benchmark; OOD is the held-out movie benchmark.
| Model | Friends s06 eval | Friends s07 held-out in-dist eval | OOD eval | Notes |
|---|---|---|---|---|
| MIRAGE single model | 0.319 | 0.310 | 0.217 | Hugging Face checkpoint |
| MIRAGE 15-member ensemble | 0.335 | 0.323 | 0.227 | Algonauts 2025 final submission ensemble |
Per-subject Pearson r on the OOD test set:
| Subject | Pearson r |
|---|---|
| sub-01 | 0.244 |
| sub-02 | 0.210 |
| sub-03 | 0.235 |
| sub-05 | 0.179 |
Usage
git clone https://github.com/epflneuroailab/mirage
cd mirage
pip install -e .
python -m brain_enc.cli.infer_fmri \
--video /path/to/video.mp4 \
--transcript /path/to/transcript.json \
--run-dir /path/to/downloaded/hf/files \
--subject-idx 0 \
--output fmri_predictions.npy
For direct loading, download model.safetensors and config.yaml from
epfl-neuroai/mirage, build the configured brain_enc model, and load weights with
load_model_state(model, "model.safetensors").
Limitations
- Predictions are conditioned on one of the four trained Algonauts 2025 subjects.
- Performance is expected to be strongest on Friends-style narrative video.
- Full raw-video extraction requires the Qwen3-Omni feature backbone and a large GPU.
Citation
@misc{gokce2026mirage,
title = {MIRAGE: Adaptive Multimodal Gating for Whole-Brain fMRI Encoding},
author = {Gokce, Abdulkadir and AlKhamissi, Badr and Schrimpf, Martin},
year = {2026},
eprint = {2605.29850},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2605.29850}
}
- Downloads last month
- 6