braindecode
/

EEGPT

+---
+license: bsd-3-clause
+library_name: braindecode
+pipeline_tag: feature-extraction
+tags:
+  - eeg
+  - biosignal
+  - pytorch
+  - neuroscience
+  - braindecode
+  - foundation-model
+  - transformer
+---
+# EEGPT
+EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals from Wang et al. (2024) .
+> **Architecture-only repository.** This repo documents the
+> `braindecode.models.EEGPT` class. **No pretrained weights are
+> distributed here** — instantiate the model and train it on your own
+> data, or fine-tune from a published foundation-model checkpoint
+> separately.
+## Quick start
+```bash
+pip install braindecode
+```
+```python
+from braindecode.models import EEGPT
+model = EEGPT(
+    n_chans=22,
+    sfreq=200,
+    input_window_seconds=4.0,
+    n_outputs=2,
+)
+```
+The signal-shape arguments above are example defaults — adjust them
+to match your recording.
+## Documentation
+- Full API reference (parameters, references, architecture figure):
+  <https://braindecode.org/stable/generated/braindecode.models.EEGPT.html>
+- Interactive browser with live instantiation:
+  <https://huggingface.co/spaces/braindecode/model-explorer>
+- Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/eegpt.py#L21>
+## Architecture description
+The block below is the rendered class docstring (parameters,
+references, architecture figure where available).
+<div class='bd-doc'><main>
+<p>EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals from Wang et al. (2024) <a class="citation-reference" href="#eegpt" id="citation-reference-1" role="doc-biblioref">[eegpt]</a>.</p>
+<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#d9534f;color:white;font-size:11px;font-weight:600;margin-right:4px;">Foundation Model</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#56B4E9;color:white;font-size:11px;font-weight:600;margin-right:4px;">Attention/Transformer</span><figure class="align-center">
+<img alt="EEGPT Architecture" src="https://github.com/BINE022/EEGPT/raw/main/figures/EEGPT.jpg" style="width: 1000px;" />
+<figcaption>
+<p>a) The EEGPT structure involves patching the input EEG signal as <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <msub>
+    <mi>p</mi>
+    <mrow>
+      <mi>i</mi>
+      <mo>,</mo>
+      <mi>j</mi>
+    </mrow>
+  </msub>
+</math> through masking
+(50% time and 80% channel patches), creating masked part <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>ℳ</mi>
+</math> and unmasked part <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mover accent="true">
+    <mi>ℳ</mi>
+    <mo stretchy="false">ˉ</mo>
+  </mover>
+</math>.
+b) Local spatio-temporal embedding maps patches to tokens.
+c) Use of dual self-supervised learning with Spatio-Temporal Representation Alignment and Mask-based Reconstruction.</p>
+</figcaption>
+</figure>
+<p><strong>EEGPT</strong> is a pretrained transformer model designed for universal EEG feature extraction.
+It addresses challenges like low SNR and inter-subject variability by employing
+a dual self-supervised learning method that combines <strong>Spatio-Temporal Representation Alignment</strong>
+and <strong>Mask-based Reconstruction</strong> <a class="citation-reference" href="#eegpt" id="citation-reference-2" role="doc-biblioref">[eegpt]</a>.</p>
+<p><strong>Model Overview (Layer-by-layer)</strong></p>
+<ol class="arabic simple">
+<li><p><strong>Patch embedding</strong> (<span class="docutils literal">_PatchEmbed</span> or <span class="docutils literal">_PatchNormEmbed</span>): split each channel into
+<span class="docutils literal">patch_size</span> time patches and project to <span class="docutils literal">embed_dim</span>, yielding tokens with shape
+<span class="docutils literal">(batch, n_patches, n_chans, embed_dim)</span>.</p></li>
+<li><p><strong>Channel embedding</strong> (<span class="docutils literal">chan_embed</span>): add a learned embedding for each channel to preserve
+spatial identity before attention.</p></li>
+<li><p><strong>Transformer encoder blocks</strong> (<span class="docutils literal">_EEGTransformer.blocks</span>): for each patch group, append
+<span class="docutils literal">embed_num</span> learned summary tokens and process the sequence with multi-head self-attention
+and MLP layers.</p></li>
+<li><p><strong>Summary extraction</strong>: keep only the summary tokens, apply <span class="docutils literal">norm</span> if set, and reshape back
+to <span class="docutils literal">(batch, n_patches, embed_num, embed_dim)</span>.</p></li>
+<li><p><strong>Task head</strong> (<span class="docutils literal">final_layer</span>): flatten summary tokens across patches and map to
+<span class="docutils literal">n_outputs</span>; if <span class="docutils literal">return_encoder_output=True</span>, return the encoder features instead.</p></li>
+</ol>
+<p><strong>Dual Self-Supervised Learning</strong></p>
+<p>EEGPT moves beyond simple masked reconstruction by introducing a representation alignment objective.
+The pretraining loss <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>ℒ</mi>
+</math> is the sum of alignment loss <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <msub>
+    <mi>ℒ</mi>
+    <mi>A</mi>
+  </msub>
+</math> and reconstruction loss <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <msub>
+    <mi>ℒ</mi>
+    <mi>R</mi>
+  </msub>
+</math>:</p>
+<div>
+<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
+  <mi>ℒ</mi>
+  <mo>=</mo>
+  <msub>
+    <mi>ℒ</mi>
+    <mi>A</mi>
+  </msub>
+  <mo>+</mo>
+  <msub>
+    <mi>ℒ</mi>
+    <mi>R</mi>
+  </msub>
+</math>
+</div>
+<ol class="arabic">
+<li><p><strong>Spatio-Temporal Representation Alignment:</strong> (<math xmlns="http://www.w3.org/1998/Math/MathML">
+  <msub>
+    <mi>ℒ</mi>
+    <mi>A</mi>
+  </msub>
+</math>)
+Aligns the predicted features of masked regions with global features extracted by a Momentum Encoder.
+This forces the model to learn semantic, high-level representations rather than just signal waveform details.</p>
+<div>
+<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
+  <msub>
+    <mi>ℒ</mi>
+    <mi>A</mi>
+  </msub>
+  <mo>=</mo>
+  <mo form="prefix">−</mo>
+  <mfrac>
+    <mn>1</mn>
+    <mi>N</mi>
+  </mfrac>
+  <munderover>
+    <mo movablelimits="true">∑</mo>
+    <mrow>
+      <mi>j</mi>
+      <mo>=</mo>
+      <mn>1</mn>
+    </mrow>
+    <mi>N</mi>
+  </munderover>
+  <mo stretchy="false">|</mo>
+  <mo stretchy="false">|</mo>
+  <mi>p</mi>
+  <mi>r</mi>
+  <mi>e</mi>
+  <msub>
+    <mi>d</mi>
+    <mi>j</mi>
+  </msub>
+  <mo>−</mo>
+  <mi>L</mi>
+  <mi>N</mi>
+  <mo stretchy="false">(</mo>
+  <mi>m</mi>
+  <mi>e</mi>
+  <mi>n</mi>
+  <msub>
+    <mi>c</mi>
+    <mi>j</mi>
+  </msub>
+  <mo stretchy="false">)</mo>
+  <mo stretchy="false">|</mo>
+  <msubsup>
+    <mo stretchy="false">|</mo>
+    <mn>2</mn>
+    <mn>2</mn>
+  </msubsup>
+</math>
+</div>
+<p>where <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>p</mi>
+  <mi>r</mi>
+  <mi>e</mi>
+  <msub>
+    <mi>d</mi>
+    <mi>j</mi>
+  </msub>
+</math> is the predictor output and <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>m</mi>
+  <mi>e</mi>
+  <mi>n</mi>
+  <msub>
+    <mi>c</mi>
+    <mi>j</mi>
+  </msub>
+</math> is the momentum encoder output.</p>
+</li>
+<li><p><strong>Mask-based Reconstruction:</strong> (<math xmlns="http://www.w3.org/1998/Math/MathML">
+  <msub>
+    <mi>ℒ</mi>
+    <mi>R</mi>
+  </msub>
+</math>)
+Standard masked autoencoder objective to reconstruct the raw EEG patches, ensuring local temporal fidelity.</p>
+<div>
+<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
+  <msub>
+    <mi>ℒ</mi>
+    <mi>R</mi>
+  </msub>
+  <mo>=</mo>
+  <mo form="prefix">−</mo>
+  <mfrac>
+    <mn>1</mn>
+    <mrow>
+      <mo stretchy="false">|</mo>
+      <mi>ℳ</mi>
+      <mo stretchy="false">|</mo>
+    </mrow>
+  </mfrac>
+  <munder>
+    <mo movablelimits="true">∑</mo>
+    <mrow>
+      <mo stretchy="false">(</mo>
+      <mi>i</mi>
+      <mo>,</mo>
+      <mi>j</mi>
+      <mo stretchy="false">)</mo>
+      <mo>∈</mo>
+      <mi>ℳ</mi>
+    </mrow>
+  </munder>
+  <mo stretchy="false">|</mo>
+  <mo stretchy="false">|</mo>
+  <mi>r</mi>
+  <mi>e</mi>
+  <msub>
+    <mi>c</mi>
+    <mrow>
+      <mi>i</mi>
+      <mo>,</mo>
+      <mi>j</mi>
+    </mrow>
+  </msub>
+  <mo>−</mo>
+  <mi>L</mi>
+  <mi>N</mi>
+  <mo stretchy="false">(</mo>
+  <msub>
+    <mi>p</mi>
+    <mrow>
+      <mi>i</mi>
+      <mo>,</mo>
+      <mi>j</mi>
+    </mrow>
+  </msub>
+  <mo stretchy="false">)</mo>
+  <mo stretchy="false">|</mo>
+  <msubsup>
+    <mo stretchy="false">|</mo>
+    <mn>2</mn>
+    <mn>2</mn>
+  </msubsup>
+</math>
+</div>
+<p>where <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>r</mi>
+  <mi>e</mi>
+  <msub>
+    <mi>c</mi>
+    <mrow>
+      <mi>i</mi>
+      <mo>,</mo>
+      <mi>j</mi>
+    </mrow>
+  </msub>
+</math> is the reconstructed patch and <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <msub>
+    <mi>p</mi>
+    <mrow>
+      <mi>i</mi>
+      <mo>,</mo>
+      <mi>j</mi>
+    </mrow>
+  </msub>
+</math> is the original patch.</p>
+</li>
+</ol>
+<p><strong>Macro Components</strong></p>
+<ul class="simple">
+<li><dl class="simple">
+<dt><cite>EEGPT.target_encoder</cite> <strong>(Universal Encoder)</strong></dt>
+<dd><ul>
+<li><p><em>Operations.</em> A hierarchical backbone that consists of <strong>Local Spatio-Temporal Embedding</strong> followed
+by a standard Transformer encoder <a class="citation-reference" href="#eegpt" id="citation-reference-3" role="doc-biblioref">[eegpt]</a>.</p></li>
+<li><p><em>Role.</em> Maps raw spatio-temporal EEG patches into a sequence of latent tokens <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>z</mi>
+</math>.</p></li>
+</ul>
+</dd>
+</dl>
+</li>
+<li><dl class="simple">
+<dt><cite>EEGPT.chans_id</cite> <strong>(Channel Identification)</strong></dt>
+<dd><ul>
+<li><p><em>Operations.</em> A buffer containing channel indices mapped from the standard channel names provided
+in <span class="docutils literal">chs_info</span> <a class="citation-reference" href="#eegpt" id="citation-reference-4" role="doc-biblioref">[eegpt]</a>.</p></li>
+<li><p><em>Role.</em> Provides the spatial identity for each input channel, allowing the model to look up
+the correct channel embedding vector <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <msub>
+    <mi>ς</mi>
+    <mi>i</mi>
+  </msub>
+</math>.</p></li>
+</ul>
+</dd>
+</dl>
+</li>
+<li><dl class="simple">
+<dt><strong>Local Spatio-Temporal Embedding</strong> (Input Processing)</dt>
+<dd><ul>
+<li><p><em>Operations.</em> The input signal <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>X</mi>
+</math> is chunked into patches <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <msub>
+    <mi>p</mi>
+    <mrow>
+      <mi>i</mi>
+      <mo>,</mo>
+      <mi>j</mi>
+    </mrow>
+  </msub>
+</math>. Each patch
+is linearly projected and summed with a specific channel embedding:
+<math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>t</mi>
+  <mi>o</mi>
+  <mi>k</mi>
+  <mi>e</mi>
+  <msub>
+    <mi>n</mi>
+    <mrow>
+      <mi>i</mi>
+      <mo>,</mo>
+      <mi>j</mi>
+    </mrow>
+  </msub>
+  <mo>=</mo>
+  <mtext>Embed</mtext>
+  <mo stretchy="false">(</mo>
+  <msub>
+    <mi>p</mi>
+    <mrow>
+      <mi>i</mi>
+      <mo>,</mo>
+      <mi>j</mi>
+    </mrow>
+  </msub>
+  <mo stretchy="false">)</mo>
+  <mo>+</mo>
+  <msub>
+    <mi>ς</mi>
+    <mi>i</mi>
+  </msub>
+</math> <a class="citation-reference" href="#eegpt" id="citation-reference-5" role="doc-biblioref">[eegpt]</a>.</p></li>
+<li><p><em>Role.</em> Converts the 2D EEG grid (Channels <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mo>×</mo>
+</math> Time) into a unified sequence of tokens
+that preserves both channel identity and temporal order.</p></li>
+</ul>
+</dd>
+</dl>
+</li>
+</ul>
+<p><strong>How the information is encoded temporally, spatially, and spectrally</strong></p>
+<ul class="simple">
+<li><p><strong>Temporal.</strong>
+The model segments continuous EEG signals into small, non-overlapping patches (e.g., 250ms windows
+with <span class="docutils literal">patch_size=64</span>) <a class="citation-reference" href="#eegpt" id="citation-reference-6" role="doc-biblioref">[eegpt]</a>. This <strong>Patching</strong> mechanism captures short-term local temporal
+structure, while the subsequent Transformer encoder captures long-range temporal dependencies across
+the entire window.</p></li>
+<li><p><strong>Spatial.</strong>
+Unlike convolutional models that may rely on fixed spatial order, EEGPT uses <strong>Channel Embeddings</strong>
+<math xmlns="http://www.w3.org/1998/Math/MathML">
+  <msub>
+    <mi>ς</mi>
+    <mi>i</mi>
+  </msub>
+</math> <a class="citation-reference" href="#eegpt" id="citation-reference-7" role="doc-biblioref">[eegpt]</a>. Each channel's data is treated as a distinct sequence of tokens tagged
+with its spatial identity. This allows the model to flexibly handle different montages and
+missing channels by simply mapping channel names to their corresponding learnable embeddings.</p></li>
+<li><p><strong>Spectral.</strong>
+Spectral information is implicitly learned through the <strong>Mask-based Reconstruction</strong> objective
+(<math xmlns="http://www.w3.org/1998/Math/MathML">
+  <msub>
+    <mi>ℒ</mi>
+    <mi>R</mi>
+  </msub>
+</math>) <a class="citation-reference" href="#eegpt" id="citation-reference-8" role="doc-biblioref">[eegpt]</a>. By forcing the model to reconstruct raw waveforms (including phase
+and amplitude) from masked inputs, the model learns to encode frequency-specific patterns necessary
+refines this by encouraging these spectral features to align with robust, high-level semantic representations.</p></li>
+</ul>
+<p><strong>Pretrained Weights</strong></p>
+<p>Weights are available on <a class="reference external" href="https://huggingface.co/braindecode/eegpt-pretrained">HuggingFace</a>.</p>
+<aside class="admonition important">
+<p class="admonition-title">Important</p>
+<p><strong>Pre-trained Weights Available</strong></p>
+<p>This model has pre-trained weights available on the Hugging Face Hub.
+<a class="reference external" href="https://huggingface.co/braindecode/eegpt-pretrained">Link here</a>.</p>
+<p>You can load them using:</p>
+<p>To push your own trained model to the Hub:</p>
+<p>Requires installing <span class="docutils literal">braindecode[hug]</span> for Hub integration.</p>
+</aside>
+<p><strong>Usage</strong></p>
+<p>The model can be initialized for specific downstream tasks (e.g., classification) by specifying
+<cite>n_outputs</cite>, <cite>chs_info</cite>, <cite>n_times</cite>.</p>
+<section id="parameters">
+<h2>Parameters</h2>
+<dl class="simple">
+<dt>return_encoder_output<span class="classifier">bool, default=False</span></dt>
+<dd><p>Whether to return the encoder output or the classifier output.</p>
+</dd>
+<dt>patch_size<span class="classifier">int, default=64</span></dt>
+<dd><p>Size of the patches for the transformer.</p>
+</dd>
+<dt>patch_stride<span class="classifier">int, default=32</span></dt>
+<dd><p>Stride of the patches for the transformer.</p>
+</dd>
+<dt>embed_num<span class="classifier">int, default=4</span></dt>
+<dd><p>Number of summary tokens used for the global representation.</p>
+</dd>
+<dt>embed_dim<span class="classifier">int, default=512</span></dt>
+<dd><p>Dimension of the embeddings.</p>
+</dd>
+<dt>depth<span class="classifier">int, default=8</span></dt>
+<dd><p>Number of transformer layers.</p>
+</dd>
+<dt>num_heads<span class="classifier">int, default=8</span></dt>
+<dd><p>Number of attention heads.</p>
+</dd>
+<dt>mlp_ratio<span class="classifier">float, default=4.0</span></dt>
+<dd><p>Ratio of the MLP hidden dimension to the embedding dimension.</p>
+</dd>
+<dt>drop_prob<span class="classifier">float, default=0.0</span></dt>
+<dd><p>Dropout probability.</p>
+</dd>
+<dt>attn_drop_rate<span class="classifier">float, default=0.0</span></dt>
+<dd><p>Attention dropout rate.</p>
+</dd>
+<dt>drop_path_rate<span class="classifier">float, default=0.0</span></dt>
+<dd><p>Drop path rate.</p>
+</dd>
+<dt>init_std<span class="classifier">float, default=0.02</span></dt>
+<dd><p>Standard deviation for weight initialization.</p>
+</dd>
+<dt>qkv_bias<span class="classifier">bool, default=True</span></dt>
+<dd><p>Whether to use bias in the QKV projection.</p>
+</dd>
+<dt>norm_layer<span class="classifier">torch.nn.Module, default=None</span></dt>
+<dd><p>Normalization layer. If None, defaults to <span class="docutils literal">nn.LayerNorm</span> with epsilon <span class="docutils literal">layer_norm_eps</span>.</p>
+</dd>
+<dt>layer_norm_eps<span class="classifier">float, default=1e-6</span></dt>
+<dd><p>Epsilon value for the normalization layer.</p>
+</dd>
+</dl>
+</section>
+<section id="references">
+<h2>References</h2>
+<div role="list" class="citation-list">
+<div class="citation" id="eegpt" role="doc-biblioentry">
+<span class="label"><span class="fn-bracket">[</span>eegpt<span class="fn-bracket">]</span></span>
+<span class="backrefs">(<a role="doc-backlink" href="#citation-reference-1">1</a>,<a role="doc-backlink" href="#citation-reference-2">2</a>,<a role="doc-backlink" href="#citation-reference-3">3</a>,<a role="doc-backlink" href="#citation-reference-4">4</a>,<a role="doc-backlink" href="#citation-reference-5">5</a>,<a role="doc-backlink" href="#citation-reference-6">6</a>,<a role="doc-backlink" href="#citation-reference-7">7</a>,<a role="doc-backlink" href="#citation-reference-8">8</a>)</span>
+<p>Wang, G., Liu, W., He, Y., Xu, C., Ma, L., &amp; Li, H. (2024).
+EEGPT: Pretrained transformer for universal and reliable representation of eeg signals.
+Advances in Neural Information Processing Systems, 37, 39249-39280.
+Online: <a class="reference external" href="https://proceedings.neurips.cc/paper_files/paper/2024/file/4540d267eeec4e5dbd9dae9448f0b739-Paper-Conference.pdf">https://proceedings.neurips.cc/paper_files/paper/2024/file/4540d267eeec4e5dbd9dae9448f0b739-Paper-Conference.pdf</a></p>
+</div>
+</div>
+</section>
+<section id="notes">
+<h2>Notes</h2>
+<p>When loading pretrained weights from the original EEGPT checkpoint (e.g., for
+fine-tuning), you may encounter &quot;unexpected keys&quot; related to the <cite>predictor</cite>
+and <cite>reconstructor</cite> modules (e.g., <cite>predictor.mask_token</cite>, <cite>reconstructor.time_embed</cite>).
+These components are used only during the self-supervised pre-training phase
+(Masked Auto-Encoder) and are not part of this encoder-only model used for
+downstream tasks. It is safe to ignore them.</p>
+<p><strong>Hugging Face Hub integration</strong></p>
+<p>When the optional <span class="docutils literal">huggingface_hub</span> package is installed, all models
+automatically gain the ability to be pushed to and loaded from the
+Hugging Face Hub. Install with:</p>
+<pre class="literal-block">pip install braindecode[hub]</pre>
+<p><strong>Pushing a model to the Hub:</strong></p>
+<p><strong>Loading a model from the Hub:</strong></p>
+<p><strong>Extracting features and replacing the head:</strong></p>
+<p><strong>Saving and restoring full configuration:</strong></p>
+<p>All model parameters (both EEG-specific and model-specific such as
+dropout rates, activation functions, number of filters) are automatically
+saved to the Hub and restored when loading.</p>
+<p>See :ref:`load-pretrained-models` for a complete tutorial.</p>
+</section>
+</main>
+</div>
+## Citation
+Please cite both the original paper for this architecture (see the
+*References* section above) and braindecode:
+```bibtex
+@article{aristimunha2025braindecode,
+  title   = {Braindecode: a deep learning library for raw electrophysiological data},
+  author  = {Aristimunha, Bruno and others},
+  journal = {Zenodo},
+  year    = {2025},
+  doi     = {10.5281/zenodo.17699192},
+}
+```
+## License
+BSD-3-Clause for the model code (matching braindecode).
+Pretraining-derived weights, if you fine-tune from a checkpoint,
+inherit the licence of that checkpoint and its training corpus.