braindecode
/

ATCNet

@@ -14,13 +14,12 @@ tags:
 # ATCNet
-ATCNet from Altaheri et al   (2022) .
-> **Architecture-only repository.** This repo documents the
 > `braindecode.models.ATCNet` class. **No pretrained weights are
-> distributed here** — instantiate the model and train it on your own
-> data, or fine-tune from a published foundation-model checkpoint
-> separately.
 ## Quick start
@@ -39,313 +38,55 @@ model = ATCNet(
 )
 ```
-The signal-shape arguments above are example defaults — adjust them
-to match your recording.
 ## Documentation
-- Full API reference (parameters, references, architecture figure):
-  <https://braindecode.org/stable/generated/braindecode.models.ATCNet.html>
-- Interactive browser with live instantiation:
   <https://huggingface.co/spaces/braindecode/model-explorer>
 - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/atcnet.py#L15>
-## Architecture description
-The block below is the rendered class docstring (parameters,
-references, architecture figure where available).
-<div class='bd-doc'><main>
-<p>ATCNet from Altaheri et al   (2022) [1]_.</p>
-<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#6c757d;color:white;font-size:11px;font-weight:600;margin-right:4px;">Recurrent</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#56B4E9;color:white;font-size:11px;font-weight:600;margin-right:4px;">Attention/Transformer</span>
- .. figure:: https://user-images.githubusercontent.com/25565236/185449791-e8539453-d4fa-41e1-865a-2cf7e91f60ef.png
-     :align: center
-     :alt: ATCNet Architecture
-     :width: 650px
- .. rubric:: Architectural Overview
- ATCNet is a *convolution-first* architecture augmented with a *lightweight attention–TCN*
- sequence module. The end-to-end flow is:
- - (i) :class:`_ConvBlock` learns temporal filter-banks and spatial projections (EEGNet-style),
-   downsampling time to a compact feature map;
- - (ii) Sliding Windows carve overlapping temporal windows from this map;
- - (iii) for each window, :class:`_AttentionBlock` applies small multi-head self-attention
-   over time, followed by a :class:`_TCNResidualBlock` stack (causal, dilated);
- - (iv) window-level features are aggregated (mean of window logits or concatenation)
-   and mapped via a max-norm–constrained linear layer.
- Relative to ViT, ATCNet replaces linear patch projection with learned *temporal–spatial*
- convolutions; it processes *parallel* window encoders (attention→TCN) instead of a deep
- stack; and swaps the MLP head for a TCN suited to 1-D EEG sequences.
- .. rubric:: Macro Components
- - :class:`_ConvBlock` **(Shallow conv stem → feature map)**
-     - *Operations.*
-     - **Temporal conv** (:class:`torch.nn.Conv2d`) with kernel ``(L_t, 1)`` builds a
-         FIR-like filter bank (``F1`` maps).
-     - **Depthwise spatial conv** (:class:`torch.nn.Conv2d`, ``groups=F1``) with kernel
-       ``(1, n_chans)`` learns per-filter spatial projections (akin to EEGNet's CSP-like step).
-     - **BN → ELU → AvgPool → Dropout** to stabilize and condense activations.
-     - **Refining temporal conv** (:class:`torch.nn.Conv2d`) with kernel ``(L_r, 1)`` +
-       **BN → ELU → AvgPool → Dropout**.
- The output shape is ``(B, F2, T_c, 1)`` with ``F2 = F1·D`` and ``T_c = T/(P1·P2)``.
- Temporal kernels behave as FIR filters; the depthwise-spatial conv yields frequency-specific
- topographies. Pooling acts as a local integrator, reducing variance and imposing a
- useful inductive bias on short EEG windows.
- - **Sliding-Window Sequencer**
-     From the condensed time axis (length ``T_c``), ATCNet forms ``n`` overlapping windows
-     of width ``T_w = T_c - n + 1`` (one start per index). Each window produces a sequence
-     ``(B, F2, T_w)`` forwarded to its own attention-TCN branch. This creates *parallel*
-     encoders over shifted contexts and is key to robustness on nonstationary EEG.
- - :class:`_AttentionBlock` **(small MHA on temporal positions)**
-     Attention here is *local to a window* and purely temporal.
-     - *Operations.*
-     - Rearrange to ``(B, T_w, F2)``,
-     - Normalization :class:`torch.nn.LayerNorm`
-     - Custom MultiHeadAttention :class:`_MHA` (``num_heads=H``, per-head dim ``d_h``) + residual add,
-     - Dropout :class:`torch.nn.Dropout`
-     - Rearrange back to ``(B, F2, T_w)``.
-     *Role.* Re-weights evidence across the window, letting the model emphasize informative
-     segments (onsets, bursts) before causal convolutions aggregate history.
- - :class:`_TCNResidualBlock` **(causal dilated temporal CNN)**
-     *Operations:*
-     - Two :class:`braindecode.modules.CausalConv1d` layers per block with dilation  ``1, 2, 4, …``
-     - Across blocks of `torch.nn.ELU` + `torch.nn.BatchNorm1d` + `torch.nn.Dropout`) +
-       a residual (identity or 1x1 mapping).
-     - The final feature used per window is the *last* causal step ``[..., -1]`` (forecast-style).
-     *Role.* Efficient long-range temporal integration with stable gradients; the dilated
-     receptive field complements attention's soft selection.
- - **Aggregation & Classifier**
-     *Operations:*
-     - Either (a) map each window feature ``(B, F2)`` to logits via :class:`braindecode.modules.MaxNormLinear`
-       and **average** across windows (default, matching official code), or
-     - (b) **concatenate** all window features ``(B, n·F2)`` and apply a single :class:`MaxNormLinear`.
-     The max-norm constraint regularizes the readout.
- .. rubric:: Convolutional Details
- - **Temporal.** Temporal structure is learned in three places:
-     - (1) the stem's wide ``(L_t, 1)`` conv (learned filter bank),
-     - (2) the refining ``(L_r, 1)`` conv after pooling (short-term dynamics), and
-     - (3) the TCN's causal 1-D convolutions with exponentially increasing dilation
-       (long-range dependencies). The minimum sequence length required by the TCN stack is
-       ``(K_t - 1)·2^{L-1} + 1``; the implementation *auto-scales* kernels/pools/windows
-       when inputs are shorter to preserve feasibility.
- - **Spatial.** A depthwise spatial conv spans the **full montage** (kernel ``(1, n_chans)``),
-     producing *per-temporal-filter* spatial projections (no cross-filter mixing at this step).
-     This mirrors EEGNet's interpretability: each temporal filter has its own spatial pattern.
- .. rubric:: Attention / Sequential Modules
- - **Type.** Multi-head self-attention with ``H`` heads and per-head dim ``d_h`` implemented
-   in :class:`_MHA`, allowing ``embed_dim = H·d_h`` independent of input and output dims.
- - **Shapes.** ``(B, F2, T_w) → (B, T_w, F2) → (B, F2, T_w)``. Attention operates along
-   the **temporal** axis within a window; channels/features stay in the embedding dim ``F2``.
- - **Role.** Highlights salient temporal positions prior to causal convolution; small attention
-   keeps compute modest while improving context modeling over pooled features.
- .. rubric:: Additional Mechanisms
- - **Parallel encoders over shifted windows.** Improves montage/phase robustness by
-   ensembling nearby contexts rather than committing to a single segmentation.
- - **Max-norm classifier.** Enforces weight norm constraints at the readout, a common
-   stabilization trick in EEG decoding.
- - **ViT vs. ATCNet (design choices).** Convolutional *nonlinear* projection rather than
-   linear patchification; attention followed by **TCN** (not MLP); *parallel* window
-   encoders rather than stacked encoders.
- .. rubric:: Usage and Configuration
- - ``conv_block_n_filters (F1)``, ``conv_block_depth_mult (D)`` → capacity of the stem
-   (with ``F2 = F1·D`` feeding attention/TCN), dimensions aligned to ``F2``, like :class:`EEGNet`.
- - Pool sizes ``P1,P2`` trade temporal resolution for stability/compute; they set
-   ``T_c = T/(P1·P2)`` and thus window width ``T_w``.
- - ``n_windows`` controls the ensemble over shifts (compute ∝ windows).
- - ``num_heads``, ``head_dim`` set attention capacity; keep ``H·d_h ≈ F2``.
- - ``tcn_depth``, ``tcn_kernel_size`` govern receptive field; larger values demand
-   longer inputs (see minimum length above). The implementation warns and *rescales*
-   kernels/pools/windows if inputs are too short.
- - **Aggregation choice.** ``concat=False`` (default, average of per-window logits) matches
-   the official code; ``concat=True`` mirrors the paper's concatenation variant.
- Parameters
- ----------
- input_window_seconds : float, optional
-     Time length of inputs, in seconds. Defaults to 4.5 s, as in BCI-IV 2a
-     dataset.
- sfreq : int, optional
-     Sampling frequency of the inputs, in Hz. Default to 250 Hz, as in
-     BCI-IV 2a dataset.
- conv_block_n_filters : int
-     Number temporal filters in the first convolutional layer of the
-     convolutional block, denoted F1 in figure 2 of the paper [1]_. Defaults
-     to 16 as in [1]_.
- conv_block_kernel_length_1 : int
-     Length of temporal filters in the first convolutional layer of the
-     convolutional block, denoted Kc in table 1 of the paper [1]_. Defaults
-     to 64 as in [1]_.
- conv_block_kernel_length_2 : int
-     Length of temporal filters in the last convolutional layer of the
-     convolutional block. Defaults to 16 as in [1]_.
- conv_block_pool_size_1 : int
-     Length of first average pooling kernel in the convolutional block.
-     Defaults to 8 as in [1]_.
- conv_block_pool_size_2 : int
-     Length of first average pooling kernel in the convolutional block,
-     denoted P2 in table 1 of the paper [1]_. Defaults to 7 as in [1]_.
- conv_block_depth_mult : int
-     Depth multiplier of depthwise convolution in the convolutional block,
-     denoted D in table 1 of the paper [1]_. Defaults to 2 as in [1]_.
- conv_block_dropout : float
-     Dropout probability used in the convolution block, denoted pc in
-     table 1 of the paper [1]_. Defaults to 0.3 as in [1]_.
- n_windows : int
-     Number of sliding windows, denoted n in [1]_. Defaults to 5 as in [1]_.
- head_dim : int
-     Embedding dimension used in each self-attention head, denoted dh in
-     table 1 of the paper [1]_. Defaults to 8 as in [1]_.
- num_heads : int
-     Number of attention heads, denoted H in table 1 of the paper [1]_.
-     Defaults to 2 as in [1]_.
- att_dropout : float
-     Dropout probability used in the attention block, denoted pa in table 1
-     of the paper [1]_. Defaults to 0.5 as in [1]_.
- tcn_depth : int
-     Depth of Temporal Convolutional Network block (i.e. number of TCN
-     Residual blocks), denoted L in table 1 of the paper [1]_. Defaults to 2
-     as in [1]_.
- tcn_kernel_size : int
-     Temporal kernel size used in TCN block, denoted Kt in table 1 of the
-     paper [1]_. Defaults to 4 as in [1]_.
- tcn_dropout : float
-     Dropout probability used in the TCN block, denoted pt in table 1
-     of the paper [1]_. Defaults to 0.3 as in [1]_.
- tcn_activation : torch.nn.Module
-     Nonlinear activation to use. Defaults to nn.ELU().
- concat : bool
-     When ``True``, concatenates each slidding window embedding before
-     feeding it to a fully-connected layer, as done in [1]_. When ``False``,
-     maps each slidding window to `n_outputs` logits and average them.
-     Defaults to ``False`` contrary to what is reported in [1]_, but
-     matching what the official code does [2]_.
- max_norm_const : float
-     Maximum L2-norm constraint imposed on weights of the last
-     fully-connected layer. Defaults to 0.25.
- Notes
- -----
- - Inputs substantially shorter than the implied minimum length trigger **automatic
-   downscaling** of kernels, pools, windows, and TCN kernel size to maintain validity.
- - The attention–TCN sequence operates **per window**; the last causal step is used as the
-   window feature, aligning the temporal semantics across windows.
- .. versionadded:: 1.1
-     - More detailed documentation of the model.
- References
- ----------
- .. [1] H. Altaheri, G. Muhammad, M. Alsulaiman (2022).
-     *Physics-informed attention temporal convolutional network for EEG-based motor imagery classification.*
-     IEEE Transactions on Industrial Informatics. doi:10.1109/TII.2022.3197419.
- .. [2] Official EEG-ATCNet implementation (TensorFlow):
-     https://github.com/Altaheri/EEG-ATCNet/blob/main/models.py
- .. rubric:: Hugging Face Hub integration
- When the optional ``huggingface_hub`` package is installed, all models
- automatically gain the ability to be pushed to and loaded from the
- Hugging Face Hub. Install with::
-     pip install braindecode[hub]
- **Pushing a model to the Hub:**
- .. code::
-     from braindecode.models import ATCNet
-     # Train your model
-     model = ATCNet(n_chans=22, n_outputs=4, n_times=1000)
-     # ... training code ...
-     # Push to the Hub
-     model.push_to_hub(
-         repo_id="username/my-atcnet-model",
-         commit_message="Initial model upload",
-     )
- **Loading a model from the Hub:**
- .. code::
-     from braindecode.models import ATCNet
-     # Load pretrained model
-     model = ATCNet.from_pretrained("username/my-atcnet-model")
-     # Load with a different number of outputs (head is rebuilt automatically)
-     model = ATCNet.from_pretrained("username/my-atcnet-model", n_outputs=4)
- **Extracting features and replacing the head:**
- .. code::
-     import torch
-     x = torch.randn(1, model.n_chans, model.n_times)
-     # Extract encoder features (consistent dict across all models)
-     out = model(x, return_features=True)
-     features = out["features"]
-     # Replace the classification head
-     model.reset_head(n_outputs=10)
- **Saving and restoring full configuration:**
- .. code::
-     import json
-     config = model.get_config()            # all __init__ params
-     with open("config.json", "w") as f:
-         json.dump(config, f)
-     model2 = ATCNet.from_config(config)    # reconstruct (no weights)
- All model parameters (both EEG-specific and model-specific such as
- dropout rates, activation functions, number of filters) are automatically
- saved to the Hub and restored when loading.
- See :ref:`load-pretrained-models` for a complete tutorial.</main>
-</div>
 ## Citation
-Please cite both the original paper for this architecture (see the
-*References* section above) and braindecode:
 ```bibtex
 @article{aristimunha2025braindecode,

 # ATCNet
+ATCNet from Altaheri et al   (2022) [1].
+> **Architecture-only repository.** Documents the
 > `braindecode.models.ATCNet` class. **No pretrained weights are
+> distributed here.** Instantiate the model and train it on your own
+> data.
 ## Quick start
 )
 ```
+The signal-shape arguments above are illustrative defaults — adjust to
+match your recording.
 ## Documentation
+- Full API reference: <https://braindecode.org/stable/generated/braindecode.models.ATCNet.html>
+- Interactive browser (live instantiation, parameter counts):
   <https://huggingface.co/spaces/braindecode/model-explorer>
 - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/atcnet.py#L15>
+## Architecture
+![ATCNet architecture](https://user-images.githubusercontent.com/25565236/185449791-e8539453-d4fa-41e1-865a-2cf7e91f60ef.png)
+## Parameters
+| Parameter | Type | Description |
+|---|---|---|
+| `input_window_seconds` | float, optional | Time length of inputs, in seconds. Defaults to 4.5 s, as in BCI-IV 2a dataset. |
+| `sfreq` | int, optional | Sampling frequency of the inputs, in Hz. Default to 250 Hz, as in BCI-IV 2a dataset. |
+| `conv_block_n_filters` | int | Number temporal filters in the first convolutional layer of the convolutional block, denoted F1 in figure 2 of the paper [1]. Defaults to 16 as in [1]. |
+| `conv_block_kernel_length_1` | int | Length of temporal filters in the first convolutional layer of the convolutional block, denoted Kc in table 1 of the paper [1]. Defaults to 64 as in [1]. |
+| `conv_block_kernel_length_2` | int | Length of temporal filters in the last convolutional layer of the convolutional block. Defaults to 16 as in [1]. |
+| `conv_block_pool_size_1` | int | Length of first average pooling kernel in the convolutional block. Defaults to 8 as in [1]. |
+| `conv_block_pool_size_2` | int | Length of first average pooling kernel in the convolutional block, denoted P2 in table 1 of the paper [1]. Defaults to 7 as in [1]. |
+| `conv_block_depth_mult` | int | Depth multiplier of depthwise convolution in the convolutional block, denoted D in table 1 of the paper [1]. Defaults to 2 as in [1]. |
+| `conv_block_dropout` | float | Dropout probability used in the convolution block, denoted pc in table 1 of the paper [1]. Defaults to 0.3 as in [1]. |
+| `n_windows` | int | Number of sliding windows, denoted n in [1]. Defaults to 5 as in [1]. |
+| `head_dim` | int | Embedding dimension used in each self-attention head, denoted dh in table 1 of the paper [1]. Defaults to 8 as in [1]. |
+| `num_heads` | int | Number of attention heads, denoted H in table 1 of the paper [1]. Defaults to 2 as in [1]. |
+| `att_dropout` | float | Dropout probability used in the attention block, denoted pa in table 1 of the paper [1]. Defaults to 0.5 as in [1]. |
+| `tcn_depth` | int | Depth of Temporal Convolutional Network block (i.e. number of TCN Residual blocks), denoted L in table 1 of the paper [1]. Defaults to 2 as in [1]. |
+| `tcn_kernel_size` | int | Temporal kernel size used in TCN block, denoted Kt in table 1 of the paper [1]. Defaults to 4 as in [1]. |
+| `tcn_dropout` | float | Dropout probability used in the TCN block, denoted pt in table 1 of the paper [1]. Defaults to 0.3 as in [1]. |
+| `tcn_activation` | torch.nn.Module | Nonlinear activation to use. Defaults to nn.ELU(). |
+| `concat` | bool | When `True`, concatenates each slidding window embedding before feeding it to a fully-connected layer, as done in [1]. When `False`, maps each slidding window to `n_outputs` logits and average them. Defaults to `False` contrary to what is reported in [1], but matching what the official code does [2]. |
+| `max_norm_const` | float | Maximum L2-norm constraint imposed on weights of the last fully-connected layer. Defaults to 0.25. |
+## References
+1. H. Altaheri, G. Muhammad, M. Alsulaiman (2022). *Physics-informed attention temporal convolutional network for EEG-based motor imagery classification.* IEEE Transactions on Industrial Informatics. doi:10.1109/TII.2022.3197419.
+2. Official EEG-ATCNet implementation (TensorFlow): https://github.com/Altaheri/EEG-ATCNet/blob/main/models.py
 ## Citation
+Cite the original architecture paper (see *References* above) and braindecode:
 ```bibtex
 @article{aristimunha2025braindecode,