braindecode
/

CodeBrain

@@ -16,11 +16,10 @@ tags:
 CodeBrain: Scalable Code EEG Pre-Training for Unified Downstream BCI Tasks.
-> **Architecture-only repository.** This repo documents the
 > `braindecode.models.CodeBrain` class. **No pretrained weights are
-> distributed here** — instantiate the model and train it on your own
-> data, or fine-tune from a published foundation-model checkpoint
-> separately.
 ## Quick start
@@ -39,187 +38,48 @@ model = CodeBrain(
 )
 ```
-The signal-shape arguments above are example defaults — adjust them
-to match your recording.
 ## Documentation
-- Full API reference (parameters, references, architecture figure):
-  <https://braindecode.org/stable/generated/braindecode.models.CodeBrain.html>
-- Interactive browser with live instantiation:
   <https://huggingface.co/spaces/braindecode/model-explorer>
 - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/codebrain.py#L21>
-## Architecture description
-The block below is the rendered class docstring (parameters,
-references, architecture figure where available).
-<div class='bd-doc'><main>
-<p>CodeBrain: Scalable Code EEG Pre-Training for Unified Downstream BCI Tasks.</p>
-<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#d9534f;color:white;font-size:11px;font-weight:600;margin-right:4px;">Foundation Model</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#56B4E9;color:white;font-size:11px;font-weight:600;margin-right:4px;">Attention/Transformer</span>
- .. figure:: https://raw.githubusercontent.com/jingyingma01/CodeBrain/refs/heads/main/assets/intro.png
-     :align: center
-     :alt: CodeBrain pre-training overview
-     :width: 1000px
- CodeBrain is a foundation model for EEG that pre-trains on large unlabelled
- corpora using a two-stage vector-quantised masking strategy, then fine-tunes
- on downstream BCI tasks. It segments EEG signals into fixed-size patches,
- embeds them with convolutional and spectral projections, and processes them
- through stacked residual blocks that combine a multi-scale convolutional
- structured state-space model (``_GConv``) with sliding-window self-attention.
- .. rubric:: Stage 2: EEGSSM Backbone (this implementation)
- This class implements Stage 2 of CodeBrain — the EEGSSM backbone described
- in Section 3.3 of [codebrain]_. Following :class:`Labram`, CodeBrain
- discretises EEG patches into codebook tokens via VQ-VAE (Stage 1, not
- implemented here), then trains the backbone to predict masked token indices
- via cross-entropy. CodeBrain extends this with a *dual* tokenizer that
- decouples temporal and frequency representations, as stated in the paper:
- *"the TFDual-Tokenizer, which decouples heterogeneous temporal and frequency
- EEG signals into discrete tokens to enhance discriminative power."*
- .. rubric:: Macro Components
- - **PatchEmbedding**: Splits ``(batch, n_chans, n_times)`` into
-   ``(batch, n_chans, seq_len, patch_size)`` patches, projects each patch
-   with a 2-D convolutional stack, adds FFT-based spectral embeddings, and
-   applies depth-wise convolutional positional encoding.
- - **Residual blocks** (``ResidualGroup``): Each block applies RMSNorm,
-   a ``_GConv`` SSM layer, and sliding-window multi-head attention, with
-   gated activation and separate residual/skip paths.
- - **Classification head** (``final_layer``): Flattens the output and maps
-   to ``n_outputs`` classes.
- .. important::
-    **Pre-trained Weights Available**
-    This model has pre-trained weights available on the Hugging Face Hub.
-    You can load them using:
-    .. code:: python
-        from braindecode.models import CodeBrain
-        # Load pre-trained model from Hugging Face Hub
-        model = CodeBrain.from_pretrained("braindecode/codebrain-pretrained")
-    To push your own trained model to the Hub:
-    .. code:: python
-        model.push_to_hub("my-username/my-codebrain")
- Parameters
- ----------
- patch_size : int, default=200
-     Number of time samples per patch. Input length is trimmed to the
-     nearest multiple of ``patch_size``.
- res_channels : int, default=200
-     Width of the residual stream inside each ``ResidualBlock``.
- skip_channels : int, default=200
-     Width of the skip-connection stream aggregated across blocks.
- out_channels : int, default=200
-     Output channels of ``final_conv`` before the classification head.
- num_res_layers : int, default=8
-     Number of stacked ``ResidualBlock`` modules.
- drop_prob : float, default=0.1
-     Dropout rate used inside the ``_GConv`` SSM and attention layers.
- s4_bidirectional : bool, default=True
-     Whether the ``_GConv`` SSM processes the sequence bidirectionally.
- s4_layernorm : bool, default=False
-     Whether to apply layer normalisation inside the ``_GConv`` SSM.
-     Set to ``False`` to match the released pretrained checkpoint.
- s4_lmax : int, default=570
-     Maximum sequence length for the ``_GConv`` SSM kernel. Also determines
-     the patch embedding dimension as ``s4_lmax // n_chans``.
- s4_d_state : int, default=64
-     State dimension of the ``_GConv`` SSM.
- conv_out_chans : int, default=25
-     Number of output channels in the patch projection convolutions.
- conv_groups : int, default=5
-     Number of groups for ``GroupNorm`` in the patch projection.
- activation : type[nn.Module], default=nn.ReLU
-     Non-linear activation class used in ``init_conv`` and ``final_conv``.
- References
- ----------
- .. [codebrain] Yi Ding, Xuyang Chen, Yong Li, Rui Yan, Tao Wang, Le Wu (2025).
-    CodeBrain: Scalable Code EEG Pre-Training for Unified Downstream BCI Tasks.
-    https://arxiv.org/abs/2506.09110
- .. rubric:: Hugging Face Hub integration
- When the optional ``huggingface_hub`` package is installed, all models
- automatically gain the ability to be pushed to and loaded from the
- Hugging Face Hub. Install with::
-     pip install braindecode[hub]
- **Pushing a model to the Hub:**
- .. code::
-     from braindecode.models import CodeBrain
-     # Train your model
-     model = CodeBrain(n_chans=22, n_outputs=4, n_times=1000)
-     # ... training code ...
-     # Push to the Hub
-     model.push_to_hub(
-         repo_id="username/my-codebrain-model",
-         commit_message="Initial model upload",
-     )
- **Loading a model from the Hub:**
- .. code::
-     from braindecode.models import CodeBrain
-     # Load pretrained model
-     model = CodeBrain.from_pretrained("username/my-codebrain-model")
-     # Load with a different number of outputs (head is rebuilt automatically)
-     model = CodeBrain.from_pretrained("username/my-codebrain-model", n_outputs=4)
- **Extracting features and replacing the head:**
- .. code::
-     import torch
-     x = torch.randn(1, model.n_chans, model.n_times)
-     # Extract encoder features (consistent dict across all models)
-     out = model(x, return_features=True)
-     features = out["features"]
-     # Replace the classification head
-     model.reset_head(n_outputs=10)
- **Saving and restoring full configuration:**
- .. code::
-     import json
-     config = model.get_config()            # all __init__ params
-     with open("config.json", "w") as f:
-         json.dump(config, f)
-     model2 = CodeBrain.from_config(config)    # reconstruct (no weights)
- All model parameters (both EEG-specific and model-specific such as
- dropout rates, activation functions, number of filters) are automatically
- saved to the Hub and restored when loading.
- See :ref:`load-pretrained-models` for a complete tutorial.</main>
-</div>
 ## Citation
-Please cite both the original paper for this architecture (see the
-*References* section above) and braindecode:
 ```bibtex
 @article{aristimunha2025braindecode,

 CodeBrain: Scalable Code EEG Pre-Training for Unified Downstream BCI Tasks.
+> **Architecture-only repository.** Documents the
 > `braindecode.models.CodeBrain` class. **No pretrained weights are
+> distributed here.** Instantiate the model and train it on your own
+> data.
 ## Quick start
 )
 ```
+The signal-shape arguments above are illustrative defaults — adjust to
+match your recording.
 ## Documentation
+- Full API reference: <https://braindecode.org/stable/generated/braindecode.models.CodeBrain.html>
+- Interactive browser (live instantiation, parameter counts):
   <https://huggingface.co/spaces/braindecode/model-explorer>
 - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/codebrain.py#L21>
+## Architecture
+![CodeBrain architecture](https://raw.githubusercontent.com/jingyingma01/CodeBrain/refs/heads/main/assets/intro.png)
+## Parameters
+| Parameter | Type | Description |
+|---|---|---|
+| `patch_size` | int, default=200 | Number of time samples per patch. Input length is trimmed to the nearest multiple of `patch_size`. |
+| `res_channels` | int, default=200 | Width of the residual stream inside each `ResidualBlock`. |
+| `skip_channels` | int, default=200 | Width of the skip-connection stream aggregated across blocks. |
+| `out_channels` | int, default=200 | Output channels of `final_conv` before the classification head. |
+| `num_res_layers` | int, default=8 | Number of stacked `ResidualBlock` modules. |
+| `drop_prob` | float, default=0.1 | Dropout rate used inside the `_GConv` SSM and attention layers. |
+| `s4_bidirectional` | bool, default=True | Whether the `_GConv` SSM processes the sequence bidirectionally. |
+| `s4_layernorm` | bool, default=False | Whether to apply layer normalisation inside the `_GConv` SSM. Set to `False` to match the released pretrained checkpoint. |
+| `s4_lmax` | int, default=570 | Maximum sequence length for the `_GConv` SSM kernel. Also determines the patch embedding dimension as `s4_lmax // n_chans`. |
+| `s4_d_state` | int, default=64 | State dimension of the `_GConv` SSM. |
+| `conv_out_chans` | int, default=25 | Number of output channels in the patch projection convolutions. |
+| `conv_groups` | int, default=5 | Number of groups for `GroupNorm` in the patch projection. |
+| `activation` | type[nn.Module], default=nn.ReLU | Non-linear activation class used in `init_conv` and `final_conv`. |
+## References
+1. Yi Ding, Xuyang Chen, Yong Li, Rui Yan, Tao Wang, Le Wu (2025). CodeBrain: Scalable Code EEG Pre-Training for Unified Downstream BCI Tasks. https://arxiv.org/abs/2506.09110
 ## Citation
+Cite the original architecture paper (see *References* above) and braindecode:
 ```bibtex
 @article{aristimunha2025braindecode,