Decoding speech from non-invasive brain recordings

Published on Aug 25, 2022


Decoding language from brain activity is a long-awaited goal in both healthcare and neuroscience. Major milestones have recently been reached thanks to intracranial devices: subject-specific pipelines trained on invasive brain responses to basic language tasks now start to efficiently decode interpretable features (e.g. letters, words, spectrograms). However, scaling this approach to natural speech and non-invasive brain recordings remains a major challenge. Here, we propose a single end-to-end architecture trained with contrastive learning across a large cohort of individuals to predict self-supervised representations of natural speech. We evaluate our model on four public datasets, encompassing 169 volunteers recorded with magneto- or electro-encephalography (M/EEG), while they listened to natural speech. The results show that our model can identify, from 3s of MEG signals, the corresponding speech segment with up to 72.5% top-10 accuracy out of 1,594 distinct segments (and 44% top-1 accuracy), and up to 19.1% out of 2,604 segments for EEG recordings -- hence allowing the decoding of phrases absent from the training set. Model comparison and ablation analyses show that these performances directly benefit from our original design choices, namely the use of (i) a contrastive objective, (ii) pretrained representations of speech and (iii) a common convolutional architecture simultaneously trained across several participants. Together, these results delineate a promising path to decode natural language processing in real time from non-invasive recordings of brain activity.


Had to review a second one today since this one seems like something out of science fiction! My summary:

Researchers at Meta trained a deep learning model on brain recordings and audio data from 169 people listening to speech. Their method achieves up to 73% accuracy at identifying a 3-second clip of speech from non-invasive EEG or MEG scans.

This is a massive improvement over previous attempts at decoding speech from neural signals. It approaches the performance of studies using implanted electrodes.

The key innovations:

  • A contrastive loss function that aligns latent speech and brain representations
  • Leveraging pretrained speech models like wav2vec 2.0
  • Training one model on multiple subjects with individual tuning

Being able to decode speech intention from brainwaves could one day help restore communication for patients suffering from strokes, ALS, etc.

There's still a ways to go before this becomes a medical reality. Performance needs to improve and be validated during speech production rather than just passive listening. And the accuracy isn't high enough for natural conversations.

But this is a hugely promising step toward brain-computer interfaces. Really interesting work at the intersection of neuroscience and AI!

TLDR: New model achieves up to 73% accuracy decoding speech directly from non-invasive brain scans. Could eventually help patients with neurological conditions communicate just by thinking.

Full summary here

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite in a model to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite in a dataset to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite in a Space to link it from this page.

Collections including this paper 1