braindecode
/

MetaNeuromotorHand

@@ -14,13 +14,12 @@ tags:
 # MetaNeuromotorHand
-Generic neuromotor interface for handwriting from Meta (2025) .
-> **Architecture-only repository.** This repo documents the
 > `braindecode.models.MetaNeuromotorHand` class. **No pretrained weights are
-> distributed here** — instantiate the model and train it on your own
-> data, or fine-tune from a published foundation-model checkpoint
-> separately.
 ## Quick start
@@ -39,394 +38,67 @@ model = MetaNeuromotorHand(
 )
 ```
-The signal-shape arguments above are example defaults — adjust them
-to match your recording.
 ## Documentation
-- Full API reference (parameters, references, architecture figure):
-  <https://braindecode.org/stable/generated/braindecode.models.MetaNeuromotorHand.html>
-- Interactive browser with live instantiation:
   <https://huggingface.co/spaces/braindecode/model-explorer>
 - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/meta_neuromotor.py#L34>
-## Architecture description
-The block below is the rendered class docstring (parameters,
-references, architecture figure where available).
-<div class='bd-doc'><main>
-<p>Generic neuromotor interface for handwriting from Meta (2025) [gni2025]_.</p>
-<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#56B4E9;color:white;font-size:11px;font-weight:600;margin-right:4px;">Attention/Transformer</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span>
- .. figure:: https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41586-025-09255-w/MediaObjects/41586_2025_9255_Fig1_HTML.png
-     :align: center
-     :alt: Platform and decoding pipeline from the Nature paper (Figure 1).
-     :width: 700px
-     Figure 1 from the paper [gni2025]_ - *"A hardware and software
-     platform for high-throughput recording and real-time decoding of
-     sEMG at the wrist."* Shows the 16-channel sEMG-RD wristband, the
-     three tasks (handwriting, gestures, wrist control) and the
-     per-task decoding pipeline at a block level.
- Conformer-based surface-EMG-to-character decoder for the handwriting
- task of Meta's generic neuromotor interface (CTRL-labs at Reality
- Labs, Nature 2025). Takes raw 16-channel surface EMG recorded at the
- wrist and emits a per-token score sequence for CTC decoding
- [graves2006ctc]_. The upstream repository
- (`facebookresearch/generic-neuromotor-interface
- <https://github.com/facebookresearch/generic-neuromotor-interface>`_)
- ships one architecture per task: 1-DOF wrist control, discrete
- gestures and handwriting. Only the handwriting head is ported here.
- .. rubric:: Macro Components
- The forward pass is a strict sequence of five modules, in order:
- 1. ``_MultivariatePowerFrequencyFeatures`` (MPF features, fixed
-    signal-processing stage, no trainable parameters).
-    - Channel-wise STFT (:func:`torch.stft`) -- ``n_fft=64`` (32 ms),
-      hop ``10`` (5 ms), Hann window.
-    - Strided windowing of consecutive STFT bins into
-      ``mpf_window_length`` (80 ms) windows sliding every
-      ``mpf_stride`` (20 ms).
-    - Per-pair cross-spectral density across channels, squared
-      magnitude.
-    - Frequency-band averaging over 6 bands
-      (0-50, 30-100, 100-225, 225-375, 375-700, 700-1000 Hz).
-    - SPD matrix logarithm via eigendecomposition
-      (Barachant et al. 2012; [pyriemann]_).
-    Output shape ``(batch, num_freq_bins, n_chans, n_chans, time')``
-    at 50 Hz (= ``sfreq / mpf_stride``).
- 2. ``_MaskAug`` -- SpecAugment [park2019specaug]_ on the MPF
-    features during training, no-op at eval. Zero parameters.
-    Hyperparameters ``mask_max_num_masks=(3, 2)`` and
-    ``mask_max_lengths=(5, 1)`` match the released checkpoints.
- 3. ``_RotationInvariantMPFMLP`` -- armband-rotation invariance.
-    - Circular roll of the 16-channel cross-spectral matrix by each
-      offset in ``invariance_offsets`` (default ``{-1, 0, +1}``).
-    - Vectorize upper triangle keeping only ``num_adjacent_cov``
-      off-diagonals (assumes circular adjacency of the armband).
-    - Shared MLP applied to each rotated vector.
-    - Mean-pool across rotations -- enforces approximate invariance
-      to rigid rotations of the armband around the wrist.
-    Output shape ``(batch, hidden_dim, time')`` with
-    ``hidden_dim = 64`` by default.
- 4. Causal conformer encoder [gulati2020conformer]_.
-    - Block structure: FF(half) -> windowed causal multi-head
-      attention -> depthwise convolution -> FF(half) ->
-      :class:`torch.nn.LayerNorm`.
-    - Depth: 15 blocks. The paper's schedule has stride ``2`` at
-      blocks 5 and 10 (total 4x temporal downsampling) and attention
-      window ``16`` for blocks 1-10 then ``8`` for blocks 11-15.
-    - Causality: attention is restricted to a fixed local window
-      ending at the current frame, so the encoder runs as a streaming
-      causal decoder. A frame-stacking step before the stack halves
-      the frame rate once more.
- 5. :class:`torch.nn.Linear` classification head, optionally followed
-    by :func:`torch.nn.functional.log_softmax`. The final linear
-    projects to ``n_outputs`` (vocabulary size, default ``100``).
-    Log-softmax is gated by ``log_softmax``; disabled by default
-    since braindecode models conventionally return logits.
- .. rubric:: Hardware, signal and training corpus
- The upstream sEMG-RD research wristband has 48 electrode pins
- arranged as 16 bipolar channels aligned with the proximal-distal
- forearm axis, a 2 kHz sample rate, a ~2.46 uVrms noise floor, and
- an analog front-end with a 20 Hz high-pass and 850 Hz low-pass.
- Before featurization the raw signal is rescaled by ``2.46e-6``
- (to unit noise s.d.) and digitally high-passed at 40 Hz (4th-order
- Butterworth) to suppress motion artifacts.
- The published handwriting decoder was trained on recordings from
- ~6,627 participants (~1 h 15 min each) prompted to "write" text
- sampled from Simple English Wikipedia, the Google Schema-guided
- Dialogue dataset and Reddit, in three postures (seated on surface,
- seated on leg, standing on leg). Participants wrote letters, digits,
- words and phrases; spaces were either implicit or prompted by a
- right-dash token produced via a right-index swipe. Training sizes
- scale geometrically from 25 to 6,527 participants; validation and
- test sets hold 50 participants each.
- .. rubric:: MPF featurizer (paper defaults)
- ``sEMG (2 kHz)`` ->
- ``STFT(n_fft=64 samples / 32 ms, hop=10 samples / 5 ms)`` ->
- per-pair complex cross-spectrum -> squared magnitude, band-averaged
- into 6 bins, then matrix-log on each 16x16 SPD matrix, produced
- every ``mpf_stride = 40 samples (20 ms)`` over a
- ``mpf_window_length = 160 samples (80 ms)`` window. Output rate:
- 50 Hz before the conformer's ``time_reduction_stride`` and the
- 2x internal strides.
- The paper's frequency bins are non-overlapping (0-62.5, 62.5-125,
- 125-250, 250-375, 375-687.5, 687.5-1000 Hz), but the upstream
- training config -- matched by the ``mpf_frequency_bins`` default --
- uses slightly overlapping bins (0-50, 30-100, 100-225, 225-375,
- 375-700, 700-1000 Hz); the code default reproduces the released
- checkpoints.
- .. rubric:: Training recipe (paper values, not defaults of this class)
- - **Loss**: CTC [graves2006ctc]_ with FastEmit regularization
-   [fastemit2021]_ to reduce streaming latency.
- - **Vocabulary**: lowercase ``[a-z]``, digits ``[0-9]``, punctuation
-   ``[,.?'!]`` and four control gestures (``space``, ``dash``,
-   ``backspace``, ``pinch``); the deployed networks used
-   ``vocab_size = 100`` (the default) to reserve blank / unused
-   slots. Greedy CTC decoding (collapse repeats) was used at test.
- - **Optimizer**: AdamW, ``weight_decay = 5e-2``.
- - **Learning rate**: cosine annealing from ``6e-4`` (1 M-parameter
-   model) or ``3e-4`` (60 M) with a 1,500-step warmup and
-   ``min_lr = 0``.
- - **Batching**: global batch size 512 (= 32 processes x 16),
-   prompts zero-padded to the longest in the batch; gradient
-   clipping at norm ``0.1``; 200 epochs. Training the largest model
-   took ~4 d 17 h on 4 x NVIDIA A10G GPUs.
- - **Augmentation**: SpecAugment on the MPF features (time and
-   frequency masks; ``mask_max_num_masks=(3, 2)``,
-   ``mask_max_lengths=(5, 1)``) plus random circular channel
-   rotations of ``{-1, 0, +1}``.
- Reported closed-loop performance: ``20.9 WPM`` on held-out naive
- users (n = 20), compared with ``25.1 WPM`` on a pen-and-paper
- baseline and ``36 WPM`` on a mobile keyboard; personalization with
- 20 min of data improves offline CER by ~16 %.
- .. rubric:: Output shape and CTC usage
- The forward pass returns a tensor of shape
- ``(batch, T_out, n_outputs)``, the natural layout for CTC.
- ``T_out`` is the downsampled emission sequence length and can be
- obtained from the input length via :meth:`compute_output_lengths`.
- For :class:`torch.nn.CTCLoss`, move the time dimension first:
- ``emissions.transpose(0, 1)``.
- .. warning::
-     The rotation-invariant MLP assumes circular channel adjacency
-     (the 16-electrode EMG armband used in the paper). For arbitrary
-     EEG montages the rotation invariance is not meaningful and this
-     model should not be used as-is.
- .. warning::
-     **License -- noncommercial use only.** This module is a
-     derivative of Meta's reference implementation and is released
-     under `CC BY-NC 4.0
-     <https://creativecommons.org/licenses/by-nc/4.0/>`_, the same
-     license as the upstream repository. The paper itself is
-     distributed under CC BY-NC-ND 4.0. Neither is covered by
-     braindecode's BSD-3 license, and both must not be used in
-     commercial products or services. Using the pretrained weights
-     carries the same restriction.
- .. versionadded:: 1.5
- Parameters
- ----------
- n_outputs : int
-     Vocabulary size for CTC. Defaults to ``100`` (handwriting
-     charset).
- n_chans : int
-     Number of EMG channels. Defaults to ``16`` (one armband).
- sfreq : float
-     Sampling frequency in Hz. Defaults to ``2000``.
- mpf_window_length : int
-     MPF window length in samples.
- mpf_stride : int
-     MPF frame stride in samples.
- mpf_n_fft : int
-     STFT window / FFT size.
- mpf_fft_stride : int
-     STFT hop size. Must divide ``mpf_stride`` and be
-     ``<= mpf_n_fft``.
- mpf_frequency_bins : sequence of (float, float) or None
-     ``(low, high)`` Hz bands to average the cross-spectrum over.
-     If ``None``, all FFT frequency bins are used.
- mask_max_num_masks : sequence of int
-     Max number of SpecAugment masks per dim (order matches
-     ``mask_dims``).
- mask_max_lengths : sequence of int
-     Max mask length per dim (order matches ``mask_dims``).
- mask_dims : str
-     Axes to mask, among ``"CFT"``. Defaults to ``"TF"``.
- mask_value : float
-     Filler value for masked regions.
- invariance_hidden_dims : sequence of int
-     Hidden layer sizes of the per-rotation MLP. Output feature dim
-     is ``invariance_hidden_dims[-1]``.
- invariance_offsets : sequence of int
-     Circular channel rotations to average over.
- num_adjacent_cov : int
-     Number of adjacent off-diagonals of the cross-channel
-     covariance matrix to keep.
- conformer_input_dim : int
-     Conformer embedding dimension ``D``.
- conformer_ffn_dim : int
-     Feed-forward hidden dim inside each block.
- conformer_kernel_size : int or sequence of int
-     Depthwise-conv kernel size per block.
- conformer_stride : int or sequence of int
-     Depthwise-conv stride per block. As a scalar, applied only to
-     the last block (entire encoder downsamples by ``stride``); as a
-     sequence of length ``conformer_num_layers``, applied per block.
-     Defaults to the paper's 15-layer schedule
-     ``(1, 1, 1, 1, 2) * 2 + (1,) * 5`` (2x downsampling at blocks 5
-     and 10). When overriding ``conformer_num_layers``, also pass a
-     matching schedule or a scalar.
- conformer_num_heads : int
-     Number of attention heads.
- conformer_attn_window_size : int or sequence of int
-     Attention receptive field per block. Defaults to the paper's
-     15-layer schedule ``(16,) * 10 + (8,) * 5``. When overriding
-     ``conformer_num_layers``, also pass a matching schedule or a
-     scalar.
- conformer_num_layers : int
-     Number of conformer blocks.
- drop_prob : float
-     Dropout probability applied throughout the conformer (FFN,
-     conv and attention blocks).
- time_reduction_stride : int
-     Frame-stacking stride applied **before** the conformer.
-     ``1`` disables it.
- log_softmax : bool
-     If ``True``, apply :func:`torch.nn.functional.log_softmax` to
-     the emissions. Disabled by default (braindecode models return
-     logits).
- activation : type of nn.Module
-     Activation class used inside the conformer feed-forward and
-     convolution blocks. Defaults to :class:`torch.nn.SiLU`.
- invariance_activation : type of nn.Module
-     Activation class used inside the rotation-invariant MLP.
-     Defaults to :class:`torch.nn.LeakyReLU`.
- Examples
- --------
- Load Meta's pretrained handwriting checkpoint (`download script`_
- in the upstream repo)::
-     import torch
-     from braindecode.models import MetaNeuromotorHand
-     ckpt = torch.load("model_checkpoint.ckpt", weights_only=False)
-     sd = {
-         k[len("network."):]: v
-         for k, v in ckpt["state_dict"].items()
-         if k.startswith("network.")
-     }
-     model = MetaNeuromotorHand(n_times=32000, log_softmax=True)
-     # load_state_dict applies the class-level ``mapping`` for
-     # upstream keys.
-     model.load_state_dict(sd, strict=True)
- .. _download script: https://github.com/facebookresearch/generic-neuromotor-interface#download-the-data-and-models
- References
- ----------
- .. [gni2025] CTRL-labs at Reality Labs (Kaifosh, P., Reardon, T. R.
-     et al.), 2025. A generic non-invasive neuromotor interface for
-     human-computer interaction. Nature 645, 702-710.
-     https://doi.org/10.1038/s41586-025-09255-w
- .. [gulati2020conformer] Gulati, A. et al., 2020. Conformer:
-     convolution-augmented transformer for speech recognition.
-     Proc. Interspeech, 5036-5040.
- .. [graves2006ctc] Graves, A., Fernandez, S., Gomez, F.,
-     Schmidhuber, J., 2006. Connectionist temporal classification:
-     labelling unsegmented sequence data with recurrent neural
-     networks. Proc. ICML, 369-376.
- .. [park2019specaug] Park, D. S. et al., 2019. SpecAugment:
-     a simple data augmentation method for automatic speech
-     recognition. Proc. Interspeech, 2613-2617.
- .. [fastemit2021] Yu, J. et al., 2021. FastEmit: low-latency
-     streaming ASR with sequence-level emission regularization.
-     Proc. ICASSP.
- .. [pyriemann] Barachant, A., Barthelemy, Q., King, J.-R., Gramfort,
-     A., Chevallier, S., Rodrigues, P. L. C., ... Aristimunha, B.,
-     2026. pyRiemann (v0.10). Zenodo.
-     https://doi.org/10.5281/zenodo.593816
- .. rubric:: Hugging Face Hub integration
- When the optional ``huggingface_hub`` package is installed, all models
- automatically gain the ability to be pushed to and loaded from the
- Hugging Face Hub. Install with::
-     pip install braindecode[hub]
- **Pushing a model to the Hub:**
- .. code::
-     from braindecode.models import MetaNeuromotorHand
-     # Train your model
-     model = MetaNeuromotorHand(n_chans=22, n_outputs=4, n_times=1000)
-     # ... training code ...
-     # Push to the Hub
-     model.push_to_hub(
-         repo_id="username/my-metaneuromotorhand-model",
-         commit_message="Initial model upload",
-     )
- **Loading a model from the Hub:**
- .. code::
-     from braindecode.models import MetaNeuromotorHand
-     # Load pretrained model
-     model = MetaNeuromotorHand.from_pretrained("username/my-metaneuromotorhand-model")
-     # Load with a different number of outputs (head is rebuilt automatically)
-     model = MetaNeuromotorHand.from_pretrained("username/my-metaneuromotorhand-model", n_outputs=4)
- **Extracting features and replacing the head:**
- .. code::
-     import torch
-     x = torch.randn(1, model.n_chans, model.n_times)
-     # Extract encoder features (consistent dict across all models)
-     out = model(x, return_features=True)
-     features = out["features"]
-     # Replace the classification head
-     model.reset_head(n_outputs=10)
- **Saving and restoring full configuration:**
- .. code::
-     import json
-     config = model.get_config()            # all __init__ params
-     with open("config.json", "w") as f:
-         json.dump(config, f)
-     model2 = MetaNeuromotorHand.from_config(config)    # reconstruct (no weights)
- All model parameters (both EEG-specific and model-specific such as
- dropout rates, activation functions, number of filters) are automatically
- saved to the Hub and restored when loading.
- See :ref:`load-pretrained-models` for a complete tutorial.</main>
-</div>
 ## Citation
-Please cite both the original paper for this architecture (see the
-*References* section above) and braindecode:
 ```bibtex
 @article{aristimunha2025braindecode,

 # MetaNeuromotorHand
+Generic neuromotor interface for handwriting from Meta (2025) [gni2025].
+> **Architecture-only repository.** Documents the
 > `braindecode.models.MetaNeuromotorHand` class. **No pretrained weights are
+> distributed here.** Instantiate the model and train it on your own
+> data.
 ## Quick start
 )
 ```
+The signal-shape arguments above are illustrative defaults — adjust to
+match your recording.
 ## Documentation
+- Full API reference: <https://braindecode.org/stable/generated/braindecode.models.MetaNeuromotorHand.html>
+- Interactive browser (live instantiation, parameter counts):
   <https://huggingface.co/spaces/braindecode/model-explorer>
 - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/meta_neuromotor.py#L34>
+## Architecture
+![MetaNeuromotorHand architecture](https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41586-025-09255-w/MediaObjects/41586_2025_9255_Fig1_HTML.png)
+## Parameters
+| Parameter | Type | Description |
+|---|---|---|
+| `n_outputs` | int | Vocabulary size for CTC. Defaults to `100` (handwriting charset). |
+| `n_chans` | int | Number of EMG channels. Defaults to `16` (one armband). |
+| `sfreq` | float | Sampling frequency in Hz. Defaults to `2000`. |
+| `mpf_window_length` | int | MPF window length in samples. |
+| `mpf_stride` | int | MPF frame stride in samples. |
+| `mpf_n_fft` | int | STFT window / FFT size. |
+| `mpf_fft_stride` | int | STFT hop size. Must divide `mpf_stride` and be `<= mpf_n_fft`. |
+| `mpf_frequency_bins` | sequence of (float, float) or None | `(low, high)` Hz bands to average the cross-spectrum over. If `None`, all FFT frequency bins are used. |
+| `mask_max_num_masks` | sequence of int | Max number of SpecAugment masks per dim (order matches `mask_dims`). |
+| `mask_max_lengths` | sequence of int | Max mask length per dim (order matches `mask_dims`). |
+| `mask_dims` | str | Axes to mask, among `"CFT"`. Defaults to `"TF"`. |
+| `mask_value` | float | Filler value for masked regions. |
+| `invariance_hidden_dims` | sequence of int | Hidden layer sizes of the per-rotation MLP. Output feature dim is `invariance_hidden_dims[-1]`. |
+| `invariance_offsets` | sequence of int | Circular channel rotations to average over. |
+| `num_adjacent_cov` | int | Number of adjacent off-diagonals of the cross-channel covariance matrix to keep. |
+| `conformer_input_dim` | int | Conformer embedding dimension `D`. |
+| `conformer_ffn_dim` | int | Feed-forward hidden dim inside each block. |
+| `conformer_kernel_size` | int or sequence of int | Depthwise-conv kernel size per block. |
+| `conformer_stride` | int or sequence of int | Depthwise-conv stride per block. As a scalar, applied only to the last block (entire encoder downsamples by `stride`); as a sequence of length `conformer_num_layers`, applied per block. Defaults to the paper's 15-layer schedule `(1, 1, 1, 1, 2) * 2 + (1,) * 5` (2x downsampling at blocks 5 and 10). When overriding `conformer_num_layers`, also pass a matching schedule or a scalar. |
+| `conformer_num_heads` | int | Number of attention heads. |
+| `conformer_attn_window_size` | int or sequence of int | Attention receptive field per block. Defaults to the paper's 15-layer schedule `(16,) * 10 + (8,) * 5`. When overriding `conformer_num_layers`, also pass a matching schedule or a scalar. |
+| `conformer_num_layers` | int | Number of conformer blocks. |
+| `drop_prob` | float | Dropout probability applied throughout the conformer (FFN, conv and attention blocks). |
+| `time_reduction_stride` | int | Frame-stacking stride applied **before** the conformer. `1` disables it. |
+| `log_softmax` | bool | If `True`, apply :func:`torch.nn.functional.log_softmax` to the emissions. Disabled by default (braindecode models return logits). |
+| `activation` | type of nn.Module | Activation class used inside the conformer feed-forward and convolution blocks. Defaults to :class:`torch.nn.SiLU`. |
+| `invariance_activation` | type of nn.Module | Activation class used inside the rotation-invariant MLP. Defaults to :class:`torch.nn.LeakyReLU`. |
+## References
+1. CTRL-labs at Reality Labs (Kaifosh, P., Reardon, T. R. et al.), 2025. A generic non-invasive neuromotor interface for human-computer interaction. Nature 645, 702-710. https://doi.org/10.1038/s41586-025-09255-w
+2. Gulati, A. et al., 2020. Conformer: convolution-augmented transformer for speech recognition. Proc. Interspeech, 5036-5040.
+3. Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J., 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. Proc. ICML, 369-376.
+4. Park, D. S. et al., 2019. SpecAugment: a simple data augmentation method for automatic speech recognition. Proc. Interspeech, 2613-2617.
+5. Yu, J. et al., 2021. FastEmit: low-latency streaming ASR with sequence-level emission regularization. Proc. ICASSP.
+6. Barachant, A., Barthelemy, Q., King, J.-R., Gramfort, A., Chevallier, S., Rodrigues, P. L. C., ... Aristimunha, B., 2026. pyRiemann (v0.10). Zenodo. https://doi.org/10.5281/zenodo.593816
 ## Citation
+Cite the original architecture paper (see *References* above) and braindecode:
 ```bibtex
 @article{aristimunha2025braindecode,