braindecode
/

EEGConformer

+---
+license: bsd-3-clause
+library_name: braindecode
+pipeline_tag: feature-extraction
+tags:
+  - eeg
+  - biosignal
+  - pytorch
+  - neuroscience
+  - braindecode
+  - convolutional
+  - transformer
+---
+# EEGConformer
+EEG Conformer from Song et al (2022) .
+> **Architecture-only repository.** This repo documents the
+> `braindecode.models.EEGConformer` class. **No pretrained weights are
+> distributed here** — instantiate the model and train it on your own
+> data, or fine-tune from a published foundation-model checkpoint
+> separately.
+## Quick start
+```bash
+pip install braindecode
+```
+```python
+from braindecode.models import EEGConformer
+model = EEGConformer(
+    n_chans=22,
+    sfreq=250,
+    input_window_seconds=4.0,
+    n_outputs=4,
+)
+```
+The signal-shape arguments above are example defaults — adjust them
+to match your recording.
+## Documentation
+- Full API reference (parameters, references, architecture figure):
+  <https://braindecode.org/stable/generated/braindecode.models.EEGConformer.html>
+- Interactive browser with live instantiation:
+  <https://huggingface.co/spaces/braindecode/model-explorer>
+- Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/eegconformer.py#L14>
+## Architecture description
+The block below is the rendered class docstring (parameters,
+references, architecture figure where available).
+<div class='bd-doc'><main>
+<p>EEG Conformer from Song et al (2022) [song2022]_.</p>
+<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#56B4E9;color:white;font-size:11px;font-weight:600;margin-right:4px;">Attention/Transformer</span>
+ .. figure:: https://raw.githubusercontent.com/eeyhsong/EEG-Conformer/refs/heads/main/visualization/Fig1.png
+     :align: center
+     :alt: EEGConformer Architecture
+     :width: 600px
+ .. rubric:: Architectural Overview
+ EEG-Conformer is a *convolution-first* model augmented with a *lightweight transformer
+ encoder*. The end-to-end flow is:
+ - (i) :class:`_PatchEmbedding` converts the continuous EEG into a compact sequence of tokens via a
+   :class:`ShallowFBCSPNet` temporal–spatial conv stem and temporal pooling;
+ - (ii) :class:`_TransformerEncoder` applies small multi-head self-attention to integrate
+   longer-range temporal context across tokens;
+ - (iii) :class:`_ClassificationHead` aggregates the sequence and performs a linear readout.
+   This preserves the strong inductive biases of shallow CNN filter banks while adding
+   just enough attention to capture dependencies beyond the pooling horizon [song2022]_.
+ .. rubric:: Macro Components
+ - :class:`_PatchEmbedding` **(Shallow conv stem → tokens)**
+     - *Operations.*
+     - A temporal convolution (`:class:torch.nn.Conv2d`) ``(1 x L_t)`` forms a data-driven "filter bank";
+     - A spatial convolution (`:class:torch.nn.Conv2d`) (n_chans x 1)`` projects across electrodes,
+       collapsing the channel axis into a virtual channel.
+     - **Normalization function** :class:`torch.nn.BatchNorm`
+     - **Activation function** :class:`torch.nn.ELU`
+     - **Average Pooling** :class:`torch.nn.AvgPool` along time (kernel ``(1, P)`` with stride ``(1, S)``)
+     -  final ``1x1`` :class:`torch.nn.Linear` projection.
+ The result is rearranged to a token sequence ``(B, S_tokens, D)``, where ``D = n_filters_time``.
+ *Interpretability/robustness.* Temporal kernels can be inspected as FIR filters;
+ the spatial conv yields channel projections analogous to :class:`ShallowFBCSPNet`'s learned
+ spatial filters. Temporal pooling stabilizes statistics and reduces sequence length.
+ - :class:`_TransformerEncoder` **(context over temporal tokens)**
+     - *Operations.*
+     - A stack of ``num_layers`` encoder blocks. :class:`_TransformerEncoderBlock`
+     - Each block applies LayerNorm :class:`torch.nn.LayerNorm`
+     - Multi-Head Self-Attention (``num_heads``) with dropout + residual :class:`MultiHeadAttention` (:class:`torch.nn.Dropout`)
+     - LayerNorm :class:`torch.nn.LayerNorm`
+     - 2-layer feed-forward (≈4x expansion, :class:`torch.nn.GELU`) with dropout + residual.
+ Shapes remain ``(B, S_tokens, D)`` throughout.
+ *Role.* Small attention focuses on interactions among *temporal patches* (not channels),
+ extending effective receptive fields at modest cost.
+ - :class:`ClassificationHead` **(aggregation + readout)**
+     - *Operations*.
+     - Flatten, :class:`torch.nn.Flatten` the sequence ``(B, S_tokens·D)`` -
+     - MLP (:class:`torch.nn.Linear` → activation (default: :class:`torch.nn.ELU`) → :class:`torch.nn.Dropout` → :class:`torch.nn.Linear`)
+     - final Linear to classes.
+ With ``return_features=True``, features before the last Linear can be exported for
+ linear probing or downstream tasks.
+ .. rubric:: Convolutional Details
+ - **Temporal (where time-domain patterns are learned).**
+     The initial ``(1 x L_t)`` conv per channel acts as a *learned filter bank* for oscillatory
+     bands and transients. Subsequent **AvgPool** along time performs local integration,
+     converting activations into “patches” (tokens). Pool length/stride control the
+     token rate and set the lower bound on temporal context within each token.
+ - **Spatial (how electrodes are processed).**
+     A single conv with kernel ``(n_chans x 1)`` spans the full montage to learn spatial
+     projections for each temporal feature map, collapsing the channel axis into a
+     virtual channel before tokenization. This mirrors the shallow spatial step in
+     :class:`ShallowFBCSPNet` (temporal filters → spatial projection → temporal condensation).
+ - **Spectral (how frequency content is captured).**
+     No explicit Fourier/wavelet stage is used. Spectral selectivity emerges implicitly
+     from the learned temporal kernels; pooling further smooths high-frequency noise.
+     The effective spectral resolution is thus governed by ``L_t`` and the pooling
+     configuration.
+ .. rubric:: Attention / Sequential Modules
+ - **Type.** Standard multi-head self-attention (MHA) with ``num_heads`` heads over the token sequence.
+ - **Shapes.** Input/Output: ``(B, S_tokens, D)``; attention operates along the ``S_tokens`` axis.
+ - **Role.** Re-weights and integrates evidence across pooled windows, capturing dependencies
+   longer than any single token while leaving channel relationships to the convolutional stem.
+   The design is intentionally *small*—attention refines rather than replaces convolutional feature extraction.
+ .. rubric:: Additional Mechanisms
+ - **Parallel with ShallowFBCSPNet.** Both begin with a learned temporal filter bank,
+     spatial projection across electrodes, and early temporal condensation.
+     :class:`ShallowFBCSPNet` then computes band-power (via squaring/log-variance), whereas
+     EEG-Conformer applies BN/ELU and **continues with attention** over tokens to
+     refine temporal context before classification.
+ - **Tokenization knob.** ``pool_time_length`` and especially ``pool_time_stride`` set
+     the number of tokens ``S_tokens``. Smaller strides → more tokens and higher attention
+     capacity (but higher compute); larger strides → fewer tokens and stronger inductive bias.
+ - **Embedding dimension = filters.** ``n_filters_time`` serves double duty as both the
+     number of temporal filters in the stem and the transformer's embedding size ``D``,
+     simplifying dimensional alignment.
+ .. rubric:: Usage and Configuration
+ - **Instantiation.** Choose ``n_filters_time`` (embedding size ``D``) and
+     ``filter_time_length`` to match the rhythms of interest. Tune
+     ``pool_time_length/stride`` to trade temporal resolution for sequence length.
+     Keep ``num_layers`` modest (e.g., 4–6) and set ``num_heads`` to divide ``D``.
+     ``final_fc_length="auto"`` infers the flattened size from PatchEmbedding.
+ Notes
+ -----
+ The authors recommend using data augmentation before using Conformer,
+ e.g. segmentation and recombination,
+ Please refer to the original paper and code for more details [ConformerCode]_.
+ The model was initially tuned on 4 seconds of 250 Hz data.
+ Please adjust the scale of the temporal convolutional layer,
+ and the pooling layer for better performance.
+ .. versionadded:: 0.8
+ We aggregate the parameters based on the parts of the models, or
+ when the parameters were used first, e.g. ``n_filters_time``.
+ .. versionadded:: 1.1
+ Parameters
+ ----------
+ n_filters_time: int
+     Number of temporal filters, defines also embedding size.
+ filter_time_length: int
+     Length of the temporal filter.
+ pool_time_length: int
+     Length of temporal pooling filter.
+ pool_time_stride: int
+     Length of stride between temporal pooling filters.
+ drop_prob: float
+     Dropout rate of the convolutional layer.
+ num_layers: int
+     Number of self-attention layers.
+ num_heads: int
+     Number of attention heads.
+ att_drop_prob: float
+     Dropout rate of the self-attention layer.
+ final_fc_length: int | str
+     The dimension of the fully connected layer.
+ return_features: bool
+     If True, the forward method returns the features before the
+     last classification layer. Defaults to False.
+ activation: nn.Module
+     Activation function as parameter. Default is nn.ELU
+ activation_transfor: nn.Module
+     Activation function as parameter, applied at the FeedForwardBlock module
+     inside the transformer. Default is nn.GeLU
+ References
+ ----------
+ .. [song2022] Song, Y., Zheng, Q., Liu, B. and Gao, X., 2022. EEG
+    conformer: Convolutional transformer for EEG decoding and visualization.
+    IEEE Transactions on Neural Systems and Rehabilitation Engineering,
+    31, pp.710-719. https://ieeexplore.ieee.org/document/9991178
+ .. [ConformerCode] Song, Y., Zheng, Q., Liu, B. and Gao, X., 2022. EEG
+    conformer: Convolutional transformer for EEG decoding and visualization.
+    https://github.com/eeyhsong/EEG-Conformer.
+ .. rubric:: Hugging Face Hub integration
+ When the optional ``huggingface_hub`` package is installed, all models
+ automatically gain the ability to be pushed to and loaded from the
+ Hugging Face Hub. Install with::
+     pip install braindecode[hub]
+ **Pushing a model to the Hub:**
+ .. code::
+     from braindecode.models import EEGConformer
+     # Train your model
+     model = EEGConformer(n_chans=22, n_outputs=4, n_times=1000)
+     # ... training code ...
+     # Push to the Hub
+     model.push_to_hub(
+         repo_id="username/my-eegconformer-model",
+         commit_message="Initial model upload",
+     )
+ **Loading a model from the Hub:**
+ .. code::
+     from braindecode.models import EEGConformer
+     # Load pretrained model
+     model = EEGConformer.from_pretrained("username/my-eegconformer-model")
+     # Load with a different number of outputs (head is rebuilt automatically)
+     model = EEGConformer.from_pretrained("username/my-eegconformer-model", n_outputs=4)
+ **Extracting features and replacing the head:**
+ .. code::
+     import torch
+     x = torch.randn(1, model.n_chans, model.n_times)
+     # Extract encoder features (consistent dict across all models)
+     out = model(x, return_features=True)
+     features = out["features"]
+     # Replace the classification head
+     model.reset_head(n_outputs=10)
+ **Saving and restoring full configuration:**
+ .. code::
+     import json
+     config = model.get_config()            # all __init__ params
+     with open("config.json", "w") as f:
+         json.dump(config, f)
+     model2 = EEGConformer.from_config(config)    # reconstruct (no weights)
+ All model parameters (both EEG-specific and model-specific such as
+ dropout rates, activation functions, number of filters) are automatically
+ saved to the Hub and restored when loading.
+ See :ref:`load-pretrained-models` for a complete tutorial.</main>
+</div>
+## Citation
+Please cite both the original paper for this architecture (see the
+*References* section above) and braindecode:
+```bibtex
+@article{aristimunha2025braindecode,
+  title   = {Braindecode: a deep learning library for raw electrophysiological data},
+  author  = {Aristimunha, Bruno and others},
+  journal = {Zenodo},
+  year    = {2025},
+  doi     = {10.5281/zenodo.17699192},
+}
+```
+## License
+BSD-3-Clause for the model code (matching braindecode).
+Pretraining-derived weights, if you fine-tune from a checkpoint,
+inherit the licence of that checkpoint and its training corpus.