braindecode
/

EEGConformer

@@ -14,13 +14,12 @@ tags:
 # EEGConformer
-EEG Conformer from Song et al (2022) .
-> **Architecture-only repository.** This repo documents the
 > `braindecode.models.EEGConformer` class. **No pretrained weights are
-> distributed here** — instantiate the model and train it on your own
-> data, or fine-tune from a published foundation-model checkpoint
-> separately.
 ## Quick start
@@ -39,268 +38,48 @@ model = EEGConformer(
 )
 ```
-The signal-shape arguments above are example defaults — adjust them
-to match your recording.
 ## Documentation
-- Full API reference (parameters, references, architecture figure):
-  <https://braindecode.org/stable/generated/braindecode.models.EEGConformer.html>
-- Interactive browser with live instantiation:
   <https://huggingface.co/spaces/braindecode/model-explorer>
 - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/eegconformer.py#L14>
-## Architecture description
-The block below is the rendered class docstring (parameters,
-references, architecture figure where available).
-<div class='bd-doc'><main>
-<p>EEG Conformer from Song et al (2022) [song2022]_.</p>
-<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#56B4E9;color:white;font-size:11px;font-weight:600;margin-right:4px;">Attention/Transformer</span>
- .. figure:: https://raw.githubusercontent.com/eeyhsong/EEG-Conformer/refs/heads/main/visualization/Fig1.png
-     :align: center
-     :alt: EEGConformer Architecture
-     :width: 600px
- .. rubric:: Architectural Overview
- EEG-Conformer is a *convolution-first* model augmented with a *lightweight transformer
- encoder*. The end-to-end flow is:
- - (i) :class:`_PatchEmbedding` converts the continuous EEG into a compact sequence of tokens via a
-   :class:`ShallowFBCSPNet` temporal–spatial conv stem and temporal pooling;
- - (ii) :class:`_TransformerEncoder` applies small multi-head self-attention to integrate
-   longer-range temporal context across tokens;
- - (iii) :class:`_ClassificationHead` aggregates the sequence and performs a linear readout.
-   This preserves the strong inductive biases of shallow CNN filter banks while adding
-   just enough attention to capture dependencies beyond the pooling horizon [song2022]_.
- .. rubric:: Macro Components
- - :class:`_PatchEmbedding` **(Shallow conv stem → tokens)**
-     - *Operations.*
-     - A temporal convolution (`:class:torch.nn.Conv2d`) ``(1 x L_t)`` forms a data-driven "filter bank";
-     - A spatial convolution (`:class:torch.nn.Conv2d`) (n_chans x 1)`` projects across electrodes,
-       collapsing the channel axis into a virtual channel.
-     - **Normalization function** :class:`torch.nn.BatchNorm`
-     - **Activation function** :class:`torch.nn.ELU`
-     - **Average Pooling** :class:`torch.nn.AvgPool` along time (kernel ``(1, P)`` with stride ``(1, S)``)
-     -  final ``1x1`` :class:`torch.nn.Linear` projection.
- The result is rearranged to a token sequence ``(B, S_tokens, D)``, where ``D = n_filters_time``.
- *Interpretability/robustness.* Temporal kernels can be inspected as FIR filters;
- the spatial conv yields channel projections analogous to :class:`ShallowFBCSPNet`'s learned
- spatial filters. Temporal pooling stabilizes statistics and reduces sequence length.
- - :class:`_TransformerEncoder` **(context over temporal tokens)**
-     - *Operations.*
-     - A stack of ``num_layers`` encoder blocks. :class:`_TransformerEncoderBlock`
-     - Each block applies LayerNorm :class:`torch.nn.LayerNorm`
-     - Multi-Head Self-Attention (``num_heads``) with dropout + residual :class:`MultiHeadAttention` (:class:`torch.nn.Dropout`)
-     - LayerNorm :class:`torch.nn.LayerNorm`
-     - 2-layer feed-forward (≈4x expansion, :class:`torch.nn.GELU`) with dropout + residual.
- Shapes remain ``(B, S_tokens, D)`` throughout.
- *Role.* Small attention focuses on interactions among *temporal patches* (not channels),
- extending effective receptive fields at modest cost.
- - :class:`ClassificationHead` **(aggregation + readout)**
-     - *Operations*.
-     - Flatten, :class:`torch.nn.Flatten` the sequence ``(B, S_tokens·D)`` -
-     - MLP (:class:`torch.nn.Linear` → activation (default: :class:`torch.nn.ELU`) → :class:`torch.nn.Dropout` → :class:`torch.nn.Linear`)
-     - final Linear to classes.
- With ``return_features=True``, features before the last Linear can be exported for
- linear probing or downstream tasks.
- .. rubric:: Convolutional Details
- - **Temporal (where time-domain patterns are learned).**
-     The initial ``(1 x L_t)`` conv per channel acts as a *learned filter bank* for oscillatory
-     bands and transients. Subsequent **AvgPool** along time performs local integration,
-     converting activations into “patches” (tokens). Pool length/stride control the
-     token rate and set the lower bound on temporal context within each token.
- - **Spatial (how electrodes are processed).**
-     A single conv with kernel ``(n_chans x 1)`` spans the full montage to learn spatial
-     projections for each temporal feature map, collapsing the channel axis into a
-     virtual channel before tokenization. This mirrors the shallow spatial step in
-     :class:`ShallowFBCSPNet` (temporal filters → spatial projection → temporal condensation).
- - **Spectral (how frequency content is captured).**
-     No explicit Fourier/wavelet stage is used. Spectral selectivity emerges implicitly
-     from the learned temporal kernels; pooling further smooths high-frequency noise.
-     The effective spectral resolution is thus governed by ``L_t`` and the pooling
-     configuration.
- .. rubric:: Attention / Sequential Modules
- - **Type.** Standard multi-head self-attention (MHA) with ``num_heads`` heads over the token sequence.
- - **Shapes.** Input/Output: ``(B, S_tokens, D)``; attention operates along the ``S_tokens`` axis.
- - **Role.** Re-weights and integrates evidence across pooled windows, capturing dependencies
-   longer than any single token while leaving channel relationships to the convolutional stem.
-   The design is intentionally *small*—attention refines rather than replaces convolutional feature extraction.
- .. rubric:: Additional Mechanisms
- - **Parallel with ShallowFBCSPNet.** Both begin with a learned temporal filter bank,
-     spatial projection across electrodes, and early temporal condensation.
-     :class:`ShallowFBCSPNet` then computes band-power (via squaring/log-variance), whereas
-     EEG-Conformer applies BN/ELU and **continues with attention** over tokens to
-     refine temporal context before classification.
- - **Tokenization knob.** ``pool_time_length`` and especially ``pool_time_stride`` set
-     the number of tokens ``S_tokens``. Smaller strides → more tokens and higher attention
-     capacity (but higher compute); larger strides → fewer tokens and stronger inductive bias.
- - **Embedding dimension = filters.** ``n_filters_time`` serves double duty as both the
-     number of temporal filters in the stem and the transformer's embedding size ``D``,
-     simplifying dimensional alignment.
- .. rubric:: Usage and Configuration
- - **Instantiation.** Choose ``n_filters_time`` (embedding size ``D``) and
-     ``filter_time_length`` to match the rhythms of interest. Tune
-     ``pool_time_length/stride`` to trade temporal resolution for sequence length.
-     Keep ``num_layers`` modest (e.g., 4–6) and set ``num_heads`` to divide ``D``.
-     ``final_fc_length="auto"`` infers the flattened size from PatchEmbedding.
- Notes
- -----
- The authors recommend using data augmentation before using Conformer,
- e.g. segmentation and recombination,
- Please refer to the original paper and code for more details [ConformerCode]_.
- The model was initially tuned on 4 seconds of 250 Hz data.
- Please adjust the scale of the temporal convolutional layer,
- and the pooling layer for better performance.
- .. versionadded:: 0.8
- We aggregate the parameters based on the parts of the models, or
- when the parameters were used first, e.g. ``n_filters_time``.
- .. versionadded:: 1.1
- Parameters
- ----------
- n_filters_time: int
-     Number of temporal filters, defines also embedding size.
- filter_time_length: int
-     Length of the temporal filter.
- pool_time_length: int
-     Length of temporal pooling filter.
- pool_time_stride: int
-     Length of stride between temporal pooling filters.
- drop_prob: float
-     Dropout rate of the convolutional layer.
- num_layers: int
-     Number of self-attention layers.
- num_heads: int
-     Number of attention heads.
- att_drop_prob: float
-     Dropout rate of the self-attention layer.
- final_fc_length: int | str
-     The dimension of the fully connected layer.
- return_features: bool
-     If True, the forward method returns the features before the
-     last classification layer. Defaults to False.
- activation: nn.Module
-     Activation function as parameter. Default is nn.ELU
- activation_transfor: nn.Module
-     Activation function as parameter, applied at the FeedForwardBlock module
-     inside the transformer. Default is nn.GeLU
- References
- ----------
- .. [song2022] Song, Y., Zheng, Q., Liu, B. and Gao, X., 2022. EEG
-    conformer: Convolutional transformer for EEG decoding and visualization.
-    IEEE Transactions on Neural Systems and Rehabilitation Engineering,
-    31, pp.710-719. https://ieeexplore.ieee.org/document/9991178
- .. [ConformerCode] Song, Y., Zheng, Q., Liu, B. and Gao, X., 2022. EEG
-    conformer: Convolutional transformer for EEG decoding and visualization.
-    https://github.com/eeyhsong/EEG-Conformer.
- .. rubric:: Hugging Face Hub integration
- When the optional ``huggingface_hub`` package is installed, all models
- automatically gain the ability to be pushed to and loaded from the
- Hugging Face Hub. Install with::
-     pip install braindecode[hub]
- **Pushing a model to the Hub:**
- .. code::
-     from braindecode.models import EEGConformer
-     # Train your model
-     model = EEGConformer(n_chans=22, n_outputs=4, n_times=1000)
-     # ... training code ...
-     # Push to the Hub
-     model.push_to_hub(
-         repo_id="username/my-eegconformer-model",
-         commit_message="Initial model upload",
-     )
- **Loading a model from the Hub:**
- .. code::
-     from braindecode.models import EEGConformer
-     # Load pretrained model
-     model = EEGConformer.from_pretrained("username/my-eegconformer-model")
-     # Load with a different number of outputs (head is rebuilt automatically)
-     model = EEGConformer.from_pretrained("username/my-eegconformer-model", n_outputs=4)
- **Extracting features and replacing the head:**
- .. code::
-     import torch
-     x = torch.randn(1, model.n_chans, model.n_times)
-     # Extract encoder features (consistent dict across all models)
-     out = model(x, return_features=True)
-     features = out["features"]
-     # Replace the classification head
-     model.reset_head(n_outputs=10)
- **Saving and restoring full configuration:**
- .. code::
-     import json
-     config = model.get_config()            # all __init__ params
-     with open("config.json", "w") as f:
-         json.dump(config, f)
-     model2 = EEGConformer.from_config(config)    # reconstruct (no weights)
- All model parameters (both EEG-specific and model-specific such as
- dropout rates, activation functions, number of filters) are automatically
- saved to the Hub and restored when loading.
- See :ref:`load-pretrained-models` for a complete tutorial.</main>
-</div>
 ## Citation
-Please cite both the original paper for this architecture (see the
-*References* section above) and braindecode:
 ```bibtex
 @article{aristimunha2025braindecode,

 # EEGConformer
+EEG Conformer from Song et al (2022) [song2022].
+> **Architecture-only repository.** Documents the
 > `braindecode.models.EEGConformer` class. **No pretrained weights are
+> distributed here.** Instantiate the model and train it on your own
+> data.
 ## Quick start
 )
 ```
+The signal-shape arguments above are illustrative defaults — adjust to
+match your recording.
 ## Documentation
+- Full API reference: <https://braindecode.org/stable/generated/braindecode.models.EEGConformer.html>
+- Interactive browser (live instantiation, parameter counts):
   <https://huggingface.co/spaces/braindecode/model-explorer>
 - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/eegconformer.py#L14>
+## Architecture
+![EEGConformer architecture](https://raw.githubusercontent.com/eeyhsong/EEG-Conformer/refs/heads/main/visualization/Fig1.png)
+## Parameters
+| Parameter | Type | Description |
+|---|---|---|
+| `n_filters_time: int` | — | Number of temporal filters, defines also embedding size. |
+| `filter_time_length: int` | — | Length of the temporal filter. |
+| `pool_time_length: int` | — | Length of temporal pooling filter. |
+| `pool_time_stride: int` | — | Length of stride between temporal pooling filters. |
+| `drop_prob: float` | — | Dropout rate of the convolutional layer. |
+| `num_layers: int` | — | Number of self-attention layers. |
+| `num_heads: int` | — | Number of attention heads. |
+| `att_drop_prob: float` | — | Dropout rate of the self-attention layer. |
+| `final_fc_length: int | str` | — | The dimension of the fully connected layer. |
+| `return_features: bool` | — | If True, the forward method returns the features before the last classification layer. Defaults to False. |
+| `activation: nn.Module` | — | Activation function as parameter. Default is nn.ELU |
+| `activation_transfor: nn.Module` | — | Activation function as parameter, applied at the FeedForwardBlock module inside the transformer. Default is nn.GeLU |
+## References
+1. Song, Y., Zheng, Q., Liu, B. and Gao, X., 2022. EEG conformer: Convolutional transformer for EEG decoding and visualization. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 31, pp.710-719. https://ieeexplore.ieee.org/document/9991178
+2. Song, Y., Zheng, Q., Liu, B. and Gao, X., 2022. EEG conformer: Convolutional transformer for EEG decoding and visualization. https://github.com/eeyhsong/EEG-Conformer.
 ## Citation
+Cite the original architecture paper (see *References* above) and braindecode:
 ```bibtex
 @article{aristimunha2025braindecode,