jesseab's picture
Added new model
04e4dc0

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

This leaderboard ranks embedding models for 3D brain structural MRIs, focusing on both image reconstruction and downstream task performance. The purpose is to provide quantitative benchmarks for brain structure embedding models based on both image compression and biological relevance.

Evaluations

  • Reconstruction Error:

  • Downstream Models: use embeddings/image-derived phenotypes to predict

    • Age (mean absolute error, MAE)
    • Sex (classification accuracy %)
    • Clinical Diagnosis (classification accuracy %)

Model info

Models can be evaluated if they meet the following criteria:

  • can accept 113x137x113 1.5mm^3 structural MRIs as input (see radiata-ai/brain-structure)
  • can be used in inference mode to produce embedding vectors and image reconstructions in a forward pass

Three types of models are considered: 1) autoencoders, 2) linear dimensionality reduction models like PCA, and 3) image-derived phenotypes (IDPs). Linear PCA models are a useful comparison for autoencoder models performing deep non-linear dimensionality reduction (2). IDP models include a set features extracted from each scan like brain region gray matter volumes and are evaluated only for downstream models, where they can be compared to embeddings in their ability to predict age, sex, and disease diagnosis.

Example models include:

Evaluation datasets

  • radiata-ai/brain-structure, which has train, validation, and test splits (80%/10%/10%). This dataset includes 3794 anonymized 3D structural MRI brain scans (T1-weighted MPRAGE NIfTI files) from 2607 individuals included in five publicly available datasets: DLBS, IXI, NKI-RS, OASIS-1, and OASIS-2. Subjects have a mean age of 45 ± 24 (age range 6-98). 3529 scans come from cognitively normal individuals and 265 scans from individuals with an Alzheimer's disease clinical diagnosis. Scan image dimensions are 113x137x113, 1.5mm^3 resolution, aligned to MNI152 space. Splits are balanced for age, sex, clinical diagnosis, and study.
  • A private dataset is forthcoming.
  • Note: there is currently no guarantee that embedding models have not been trained on the validation/test datasets. Hence the need for private datasets.

Downstream models

Downstream models are fit using feature vectors (embeddings or IDPs) for all scans from the training set. For age, linear regression is used. This model is then applied to validation and testing sets to measure out-of-sample performance. For sex (genetic F/M) and clinical diagnosis (clinical Alzheimer's disease (AD)/cognitively normal(CN)), linear discriminant analysis classification is used. These models are then applied to validation and testing sets to measure out-of-sample performance. Age and sex models are only fit and evaluated on scans from subjects with a clinical diagnosis of cognitively normal.

Rank computation

Each metric is ranked within its category for the test dataset results. Overall rank is computed by combining reconstruction and downstream ranks.

Repository

The evaluation code can be found in the Radiata leaderboard GitHub repo.

Citation

@misc{brain2vec-leaderboard,
  author = {Jesse Brown and Clayton Young},
  title = {Brain2vec Leaderboard},
  year = {2025},
  url = {https://huggingface.co/spaces/radiata-ai/brain2vec_leaderboard},
  publisher = {Hugging Face},
}

Contact

For any questions or to submit a model please contact jesse.brown@radiata.ai.

References

  1. Guo P, Zhao C, Yang D, Xu Z, Nath V, Tang Y, et al. MAISI: Medical AI for Synthetic Imaging [Internet]. arXiv; 2024. Available from: http://arxiv.org/abs/2409.11169
  2. Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006 Jul 28;313(5786):504–7.
  3. Puglisi L, Alexander DC, Ravì D. Enhancing Spatiotemporal Disease Progression Models via Latent Diffusion and Prior Knowledge [Internet]. arXiv; 2024. Available from: http://arxiv.org/abs/2405.03328

Roadmap

  • Private evaluation dataset
  • Allow model submissions
  • Expanded image-derived phenotype set