braindecode
/

MEDFormer

@@ -10,18 +10,16 @@ tags:
   - braindecode
   - foundation-model
   - convolutional
-  - transformer
 ---
 # MEDFormer
-Medformer from Wang et al (2024) .
-> **Architecture-only repository.** This repo documents the
 > `braindecode.models.MEDFormer` class. **No pretrained weights are
-> distributed here** — instantiate the model and train it on your own
-> data, or fine-tune from a published foundation-model checkpoint
-> separately.
 ## Quick start
@@ -40,841 +38,46 @@ model = MEDFormer(
 )
 ```
-The signal-shape arguments above are example defaults — adjust them
-to match your recording.
 ## Documentation
-- Full API reference (parameters, references, architecture figure):
-  <https://braindecode.org/stable/generated/braindecode.models.MEDFormer.html>
-- Interactive browser with live instantiation:
   <https://huggingface.co/spaces/braindecode/model-explorer>
 - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/medformer.py#L20>
-## Architecture description
-The block below is the rendered class docstring (parameters,
-references, architecture figure where available).
-<div class='bd-doc'><main>
-<p>Medformer from Wang et al (2024) <a class="citation-reference" href="#medformer2024" id="citation-reference-1" role="doc-biblioref">[Medformer2024]</a>.</p>
-<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#d9534f;color:white;font-size:11px;font-weight:600;margin-right:4px;">Foundation Model</span><figure class="align-center">
-<img alt="MEDFormer Architecture." src="https://raw.githubusercontent.com/DL4mHealth/Medformer/refs/heads/main/figs/medformer_architecture.png" />
-<figcaption>
-<p>a) Workflow. b) For the input sample <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msub>
-    <mi>x</mi>
-    <mtext>in</mtext>
-  </msub>
-</math>, the authors apply <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <mi>n</mi>
-</math>
-different patch lengths in parallel to create patched features <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msubsup>
-    <mi>x</mi>
-    <mi>p</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msubsup>
-</math>, where <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <mi>i</mi>
-</math>
-ranges from 1 to <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <mi>n</mi>
-</math>. Each patch length represents a different granularity. These patched
-features are linearly transformed into <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msubsup>
-    <mi>x</mi>
-    <mi>e</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msubsup>
-</math> and augmented into <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msup>
-    <munderover>
-      <mi>x</mi>
-      <mi>e</mi>
-      <mo accent="true">~</mo>
-    </munderover>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-</math>.
-c) The final patch embedding <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msup>
-    <mi>x</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-</math> fuses augmented <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msup>
-    <munderover>
-      <mi>x</mi>
-      <mi>e</mi>
-      <mo accent="true">~</mo>
-    </munderover>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-</math> with the
-positional embedding <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msub>
-    <mi>W</mi>
-    <mtext>pos</mtext>
-  </msub>
-</math> and the granularity embedding <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msubsup>
-    <mi>W</mi>
-    <mtext>gr</mtext>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msubsup>
-</math>.
-Each granularity employs a router <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msup>
-    <mi>u</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-</math> to capture aggregated information.
-Intra-granularity attention focuses within individual granularities, and inter-granularity attention
-leverages the routers to integrate information across granularities.</p>
-</figcaption>
-</figure>
-<p>The <strong>MedFormer</strong> is a multi-granularity patching transformer tailored to medical
-time-series (MedTS) classification, with an emphasis on EEG and ECG signals. It captures
-local temporal dynamics, inter-channel correlations, and multi-scale temporal structure
-through cross-channel patching, multi-granularity embeddings, and two-stage attention
-<a class="citation-reference" href="#medformer2024" id="citation-reference-2" role="doc-biblioref">[Medformer2024]</a>.</p>
-<p><strong>Architecture Overview</strong></p>
-<p>MedFormer integrates three mechanisms to enhance representation learning <a class="citation-reference" href="#medformer2024" id="citation-reference-3" role="doc-biblioref">[Medformer2024]</a>:</p>
-<ol class="arabic simple">
-<li><p><strong>Cross-channel patching.</strong> Leverages inter-channel correlations by forming patches
-across multiple channels and timestamps, capturing multi-timestamp and cross-channel
-patterns.</p></li>
-<li><p><strong>Multi-granularity embedding.</strong> Extracts features at different temporal scales from
-:attr:`patch_len_list`, emulating frequency-band behavior without hand-crafted filters.</p></li>
-<li><p><strong>Two-stage multi-granularity self-attention.</strong> Learns intra- and inter-granularity
-correlations to fuse information across temporal scales.</p></li>
-</ol>
-<p><strong>Macro Components</strong></p>
-<dl>
-<dt><span class="docutils literal">MEDFormer.enc_embedding</span> (Embedding Layer)</dt>
-<dd><p><strong>Operations.</strong> :class:`~braindecode.models.medformer._ListPatchEmbedding` implements
-cross-channel multi-granularity patching. For each patch length <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msub>
-    <mi>L</mi>
-    <mi>i</mi>
-  </msub>
-</math>, the input
-<math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msub>
-    <mi>𝐱</mi>
-    <mtext>in</mtext>
-  </msub>
-  <mo>∈</mo>
-  <msup>
-    <mi>ℝ</mi>
-    <mrow>
-      <mi>T</mi>
-      <mo>×</mo>
-      <mi>C</mi>
-    </mrow>
-  </msup>
-</math> is segmented into
-<math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msub>
-    <mi>N</mi>
-    <mi>i</mi>
-  </msub>
-</math> cross-channel non-overlapping patches
-<math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msubsup>
-    <mi>𝐱</mi>
-    <mi>p</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msubsup>
-  <mo>∈</mo>
-  <msup>
-    <mi>ℝ</mi>
-    <mrow>
-      <msub>
-        <mi>N</mi>
-        <mi>i</mi>
-      </msub>
-      <mo>×</mo>
-      <mo stretchy="false">(</mo>
-      <msub>
-        <mi>L</mi>
-        <mi>i</mi>
-      </msub>
-      <mo>⋅</mo>
-      <mi>C</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-</math>, where
-<math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msub>
-    <mi>N</mi>
-    <mi>i</mi>
-  </msub>
-  <mo>=</mo>
-  <mo>⌈</mo>
-  <mi>T</mi>
-  <mo stretchy="false">/</mo>
-  <msub>
-    <mi>L</mi>
-    <mi>i</mi>
-  </msub>
-  <mo>⌉</mo>
-</math>. Each patch is linearly projected via
-:class:`~braindecode.models.medformer._CrossChannelTokenEmbedding` to obtain
-<math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msubsup>
-    <mi>𝐱</mi>
-    <mi>e</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msubsup>
-  <mo>∈</mo>
-  <msup>
-    <mi>ℝ</mi>
-    <mrow>
-      <msub>
-        <mi>N</mi>
-        <mi>i</mi>
-      </msub>
-      <mo>×</mo>
-      <mi>D</mi>
-    </mrow>
-  </msup>
-</math>. Data augmentations
-(masking, jittering) produce augmented embeddings <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msup>
-    <munderover>
-      <mi>𝐱</mi>
-      <mi>e</mi>
-      <mo stretchy="false">~</mo>
-    </munderover>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-</math>.
-The final embedding combines augmented patches, fixed positional embeddings
-(:class:`~braindecode.models.medformer._PositionalEmbedding`), and learnable
-granularity embeddings <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msubsup>
-    <mi>𝐖</mi>
-    <mtext>gr</mtext>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msubsup>
-</math>:</p>
-<div>
-<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
-  <msup>
-    <mi>𝐱</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>=</mo>
-  <msup>
-    <munderover>
-      <mi>𝐱</mi>
-      <mi>e</mi>
-      <mo stretchy="false">~</mo>
-    </munderover>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>+</mo>
-  <msub>
-    <mi>𝐖</mi>
-    <mtext>pos</mtext>
-  </msub>
-  <mo stretchy="false">[</mo>
-  <mn>1</mn>
-  <mo>∶</mo>
-  <msub>
-    <mi>N</mi>
-    <mi>i</mi>
-  </msub>
-  <mo stretchy="false">]</mo>
-  <mo>+</mo>
-  <msubsup>
-    <mi>𝐖</mi>
-    <mtext>gr</mtext>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msubsup>
-</math>
-</div>
-<p>Additionally, a router token is initialized for each granularity:</p>
-<div>
-<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
-  <msup>
-    <mi>𝐮</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>=</mo>
-  <msub>
-    <mi>𝐖</mi>
-    <mtext>pos</mtext>
-  </msub>
-  <mo stretchy="false">[</mo>
-  <msub>
-    <mi>N</mi>
-    <mi>i</mi>
-  </msub>
-  <mo>+</mo>
-  <mn>1</mn>
-  <mo stretchy="false">]</mo>
-  <mo>+</mo>
-  <msubsup>
-    <mi>𝐖</mi>
-    <mtext>gr</mtext>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msubsup>
-</math>
-</div>
-<p><strong>Role.</strong> Converts raw input into granularity-specific patch embeddings
-<math xmlns="http://www.w3.org/1998/Math/MathML">
-  <mo>{</mo>
-  <msup>
-    <mi>𝐱</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mn>1</mn>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>,</mo>
-  <mi>…</mi>
-  <mo>,</mo>
-  <msup>
-    <mi>𝐱</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>n</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>}</mo>
-</math> and router embeddings
-<math xmlns="http://www.w3.org/1998/Math/MathML">
-  <mo>{</mo>
-  <msup>
-    <mi>𝐮</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mn>1</mn>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>,</mo>
-  <mi>…</mi>
-  <mo>,</mo>
-  <msup>
-    <mi>𝐮</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>n</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>}</mo>
-</math> for multi-scale processing.</p>
-</dd>
-<dt><span class="docutils literal">MEDFormer.encoder</span> (Transformer Encoder Stack)</dt>
-<dd><p><strong>Operations.</strong> A stack of :class:`~braindecode.models.medformer._EncoderLayer` modules,
-each containing a :class:`~braindecode.models.medformer._MedformerLayer` that implements
-two-stage self-attention. The two-stage mechanism splits self-attention into:</p>
-<p><strong>(a) Intra-Granularity Self-Attention.</strong> For granularity <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <mi>i</mi>
-</math>, the patch embedding
-<math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msup>
-    <mi>𝐱</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>∈</mo>
-  <msup>
-    <mi>ℝ</mi>
-    <mrow>
-      <msub>
-        <mi>N</mi>
-        <mi>i</mi>
-      </msub>
-      <mo>×</mo>
-      <mi>D</mi>
-    </mrow>
-  </msup>
-</math> and router embedding
-<math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msup>
-    <mi>𝐮</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>∈</mo>
-  <msup>
-    <mi>ℝ</mi>
-    <mrow>
-      <mn>1</mn>
-      <mo>×</mo>
-      <mi>D</mi>
-    </mrow>
-  </msup>
-</math> are concatenated:</p>
-<div>
-<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
-  <msup>
-    <mi>𝐳</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>=</mo>
-  <mo stretchy="false">[</mo>
-  <msup>
-    <mi>𝐱</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>‖</mo>
-  <msup>
-    <mi>𝐮</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo stretchy="false">]</mo>
-  <mo>∈</mo>
-  <msup>
-    <mi>ℝ</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <msub>
-        <mi>N</mi>
-        <mi>i</mi>
-      </msub>
-      <mo>+</mo>
-      <mn>1</mn>
-      <mo stretchy="false">)</mo>
-      <mo>×</mo>
-      <mi>D</mi>
-    </mrow>
-  </msup>
-</math>
-</div>
-<p>Self-attention is applied to update both embeddings:</p>
-<div>
-<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
-  <mtable class="ams-align" displaystyle="true">
-    <mtr>
-      <mtd>
-        <msup>
-          <mi>𝐱</mi>
-          <mrow>
-            <mo stretchy="false">(</mo>
-            <mi>i</mi>
-            <mo stretchy="false">)</mo>
-          </mrow>
-        </msup>
-      </mtd>
-      <mtd>
-        <mo>←</mo>
-        <msub>
-          <mtext>Attn</mtext>
-          <mtext>intra</mtext>
-        </msub>
-        <mo stretchy="false">(</mo>
-        <msup>
-          <mi>𝐱</mi>
-          <mrow>
-            <mo stretchy="false">(</mo>
-            <mi>i</mi>
-            <mo stretchy="false">)</mo>
-          </mrow>
-        </msup>
-        <mo>,</mo>
-        <msup>
-          <mi>𝐳</mi>
-          <mrow>
-            <mo stretchy="false">(</mo>
-            <mi>i</mi>
-            <mo stretchy="false">)</mo>
-          </mrow>
-        </msup>
-        <mo>,</mo>
-        <msup>
-          <mi>𝐳</mi>
-          <mrow>
-            <mo stretchy="false">(</mo>
-            <mi>i</mi>
-            <mo stretchy="false">)</mo>
-          </mrow>
-        </msup>
-        <mo stretchy="false">)</mo>
-      </mtd>
-    </mtr>
-    <mtr>
-      <mtd>
-        <msup>
-          <mi>𝐮</mi>
-          <mrow>
-            <mo stretchy="false">(</mo>
-            <mi>i</mi>
-            <mo stretchy="false">)</mo>
-          </mrow>
-        </msup>
-      </mtd>
-      <mtd>
-        <mo>←</mo>
-        <msub>
-          <mtext>Attn</mtext>
-          <mtext>intra</mtext>
-        </msub>
-        <mo stretchy="false">(</mo>
-        <msup>
-          <mi>𝐮</mi>
-          <mrow>
-            <mo stretchy="false">(</mo>
-            <mi>i</mi>
-            <mo stretchy="false">)</mo>
-          </mrow>
-        </msup>
-        <mo>,</mo>
-        <msup>
-          <mi>𝐳</mi>
-          <mrow>
-            <mo stretchy="false">(</mo>
-            <mi>i</mi>
-            <mo stretchy="false">)</mo>
-          </mrow>
-        </msup>
-        <mo>,</mo>
-        <msup>
-          <mi>𝐳</mi>
-          <mrow>
-            <mo stretchy="false">(</mo>
-            <mi>i</mi>
-            <mo stretchy="false">)</mo>
-          </mrow>
-        </msup>
-        <mo stretchy="false">)</mo>
-      </mtd>
-    </mtr>
-  </mtable>
-</math>
-</div>
-<p>This captures temporal features within each granularity independently.</p>
-<p><strong>(b) Inter-Granularity Self-Attention.</strong> All router embeddings are concatenated:</p>
-<div>
-<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
-  <mi>𝐔</mi>
-  <mo>=</mo>
-  <mo stretchy="false">[</mo>
-  <msup>
-    <mi>𝐮</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mn>1</mn>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>‖</mo>
-  <msup>
-    <mi>𝐮</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mn>2</mn>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>‖</mo>
-  <mi>⋯</mi>
-  <mo>‖</mo>
-  <msup>
-    <mi>𝐮</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>n</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo stretchy="false">]</mo>
-  <mo>∈</mo>
-  <msup>
-    <mi>ℝ</mi>
-    <mrow>
-      <mi>n</mi>
-      <mo>×</mo>
-      <mi>D</mi>
-    </mrow>
-  </msup>
-</math>
-</div>
-<p>Self-attention among routers exchanges information across granularities:</p>
-<div>
-<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
-  <msup>
-    <mi>𝐮</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>←</mo>
-  <msub>
-    <mtext>Attn</mtext>
-    <mtext>inter</mtext>
-  </msub>
-  <mo stretchy="false">(</mo>
-  <msup>
-    <mi>𝐮</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-  <mo>,</mo>
-  <mi>𝐔</mi>
-  <mo>,</mo>
-  <mi>𝐔</mi>
-  <mo stretchy="false">)</mo>
-</math>
-</div>
-<p><strong>Role.</strong> Learns representations and correlations within and across temporal scales while
-reducing complexity from <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <mi>O</mi>
-  <mo stretchy="false">(</mo>
-  <mo stretchy="false">(</mo>
-  <munder>
-    <mo movablelimits="true">∑</mo>
-    <mi>i</mi>
-  </munder>
-  <msub>
-    <mi>N</mi>
-    <mi>i</mi>
-  </msub>
-  <msup>
-    <mo stretchy="false">)</mo>
-    <mn>2</mn>
-  </msup>
-  <mo stretchy="false">)</mo>
-</math> to
-<math xmlns="http://www.w3.org/1998/Math/MathML">
-  <mi>O</mi>
-  <mo stretchy="false">(</mo>
-  <munder>
-    <mo movablelimits="true">∑</mo>
-    <mi>i</mi>
-  </munder>
-  <msubsup>
-    <mi>N</mi>
-    <mi>i</mi>
-    <mn>2</mn>
-  </msubsup>
-  <mo>+</mo>
-  <msup>
-    <mi>n</mi>
-    <mn>2</mn>
-  </msup>
-  <mo stretchy="false">)</mo>
-</math> through the router mechanism.</p>
-</dd>
-</dl>
-<p><strong>Temporal, Spatial, and Spectral Encoding</strong></p>
-<ul class="simple">
-<li><p><strong>Temporal:</strong> Multiple patch lengths in :attr:`patch_len_list` capture features at several
-temporal granularities, while intra-granularity attention supports long-range temporal
-dependencies.</p></li>
-<li><p><strong>Spatial:</strong> Cross-channel patching embeds inter-channel dependencies by applying kernels
-that span every input channel.</p></li>
-<li><p><strong>Spectral:</strong> Differing patch lengths simulate multiple sampling frequencies analogous to
-clinically relevant bands (e.g., alpha, beta, gamma).</p></li>
-</ul>
-<p><strong>Additional Mechanisms</strong></p>
-<ul class="simple">
-<li><p><strong>Granularity router:</strong> Each granularity <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <mi>i</mi>
-</math> receives a dedicated router token
-<math xmlns="http://www.w3.org/1998/Math/MathML">
-  <msup>
-    <mi>𝐮</mi>
-    <mrow>
-      <mo stretchy="false">(</mo>
-      <mi>i</mi>
-      <mo stretchy="false">)</mo>
-    </mrow>
-  </msup>
-</math>. Intra-attention updates the token, and inter-attention exchanges
-aggregated information across scales.</p></li>
-<li><p><strong>Complexity:</strong> Router-mediated two-stage attention maintains <math xmlns="http://www.w3.org/1998/Math/MathML">
-  <mi>O</mi>
-  <mo stretchy="false">(</mo>
-  <msup>
-    <mi>T</mi>
-    <mn>2</mn>
-  </msup>
-  <mo stretchy="false">)</mo>
-</math> complexity for
-suitable patch lengths (e.g., power series), preserving transformer-like efficiency while
-modeling multiple granularities.</p></li>
-</ul>
-<section id="parameters">
-<h2>Parameters</h2>
-<dl class="simple">
-<dt>patch_len_list<span class="classifier">list of int, optional</span></dt>
-<dd><p>Patch lengths for multi-granularity patching; each entry selects a temporal scale.
-The default is <span class="docutils literal">[14, 44, 45]</span>.</p>
-</dd>
-<dt>embed_dim<span class="classifier">int, optional</span></dt>
-<dd><p>Embedding dimensionality. The default is <span class="docutils literal">128</span>.</p>
-</dd>
-<dt>num_heads<span class="classifier">int, optional</span></dt>
-<dd><p>Number of attention heads, which must divide :attr:`d_model`. The default is <span class="docutils literal">8</span>.</p>
-</dd>
-<dt>drop_prob<span class="classifier">float, optional</span></dt>
-<dd><p>Dropout probability. The default is <span class="docutils literal">0.1</span>.</p>
-</dd>
-<dt>no_inter_attn<span class="classifier">bool, optional</span></dt>
-<dd><p>If <span class="docutils literal">True</span>, disables inter-granularity attention. The default is <span class="docutils literal">False</span>.</p>
-</dd>
-<dt>num_layers<span class="classifier">int, optional</span></dt>
-<dd><p>Number of encoder layers. The default is <span class="docutils literal">6</span>.</p>
-</dd>
-<dt>dim_feedforward<span class="classifier">int, optional</span></dt>
-<dd><p>Feedforward dimensionality. The default is <span class="docutils literal">256</span>.</p>
-</dd>
-<dt>activation_trans<span class="classifier">nn.Module, optional</span></dt>
-<dd><p>Activation module used in transformer encoder layers. The default is :class:`nn.ReLU`.</p>
-</dd>
-<dt>single_channel<span class="classifier">bool, optional</span></dt>
-<dd><p>If <span class="docutils literal">True</span>, processes each channel independently, increasing capacity and cost. The default is <span class="docutils literal">False</span>.</p>
-</dd>
-<dt>output_attention<span class="classifier">bool, optional</span></dt>
-<dd><p>If <span class="docutils literal">True</span>, returns attention weights for interpretability. The default is <span class="docutils literal">True</span>.</p>
-</dd>
-<dt>activation_class<span class="classifier">nn.Module, optional</span></dt>
-<dd><p>Activation used in the final classification layer. The default is :class:`nn.GELU`.</p>
-</dd>
-</dl>
-</section>
-<section id="notes">
-<h2>Notes</h2>
-<ul class="simple">
-<li><p>MedFormer outperforms strong baselines across six metrics on five MedTS datasets in a
-subject-independent evaluation <a class="citation-reference" href="#medformer2024" id="citation-reference-4" role="doc-biblioref">[Medformer2024]</a>.</p></li>
-<li><p>Cross-channel patching provides the largest F1 improvement in ablation studies (average
-+6.10%), highlighting its importance for MedTS tasks <a class="citation-reference" href="#medformer2024" id="citation-reference-5" role="doc-biblioref">[Medformer2024]</a>.</p></li>
-<li><p>Setting :attr:`no_inter_attn` to <span class="docutils literal">True</span> disables inter-granularity attention while retaining
-intra-granularity attention.</p></li>
-</ul>
-</section>
-<section id="references">
-<h2>References</h2>
-<div role="list" class="citation-list">
-<div class="citation" id="medformer2024" role="doc-biblioentry">
-<span class="label"><span class="fn-bracket">[</span>Medformer2024<span class="fn-bracket">]</span></span>
-<span class="backrefs">(<a role="doc-backlink" href="#citation-reference-1">1</a>,<a role="doc-backlink" href="#citation-reference-2">2</a>,<a role="doc-backlink" href="#citation-reference-3">3</a>,<a role="doc-backlink" href="#citation-reference-4">4</a>,<a role="doc-backlink" href="#citation-reference-5">5</a>)</span>
-<p>Wang, Y., Huang, N., Li, T., Yan, Y., &amp; Zhang, X. (2024).
-Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification.
-In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, &amp; C. Zhang (Eds.),
-Advances in Neural Information Processing Systems (Vol. 37, pp. 36314-36341).
-doi:10.52202/079017-1145.</p>
-</div>
-</div>
-<p><strong>Hugging Face Hub integration</strong></p>
-<p>When the optional <span class="docutils literal">huggingface_hub</span> package is installed, all models
-automatically gain the ability to be pushed to and loaded from the
-Hugging Face Hub. Install with:</p>
-<pre class="literal-block">pip install braindecode[hub]</pre>
-<p><strong>Pushing a model to the Hub:</strong></p>
-<p><strong>Loading a model from the Hub:</strong></p>
-<p><strong>Extracting features and replacing the head:</strong></p>
-<p><strong>Saving and restoring full configuration:</strong></p>
-<p>All model parameters (both EEG-specific and model-specific such as
-dropout rates, activation functions, number of filters) are automatically
-saved to the Hub and restored when loading.</p>
-<p>See :ref:`load-pretrained-models` for a complete tutorial.</p>
-</section>
-</main>
-</div>
 ## Citation
-Please cite both the original paper for this architecture (see the
-*References* section above) and braindecode:
 ```bibtex
 @article{aristimunha2025braindecode,

   - braindecode
   - foundation-model
   - convolutional
 ---
 # MEDFormer
+Medformer from Wang et al (2024) [Medformer2024].
+> **Architecture-only repository.** Documents the
 > `braindecode.models.MEDFormer` class. **No pretrained weights are
+> distributed here.** Instantiate the model and train it on your own
+> data.
 ## Quick start
 )
 ```
+The signal-shape arguments above are illustrative defaults — adjust to
+match your recording.
 ## Documentation
+- Full API reference: <https://braindecode.org/stable/generated/braindecode.models.MEDFormer.html>
+- Interactive browser (live instantiation, parameter counts):
   <https://huggingface.co/spaces/braindecode/model-explorer>
 - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/medformer.py#L20>
+## Architecture
+![MEDFormer architecture](https://raw.githubusercontent.com/DL4mHealth/Medformer/refs/heads/main/figs/medformer_architecture.png)
+## Parameters
+| Parameter | Type | Description |
+|---|---|---|
+| `patch_len_list` | list of int, optional | Patch lengths for multi-granularity patching; each entry selects a temporal scale. The default is `[14, 44, 45]`. |
+| `embed_dim` | int, optional | Embedding dimensionality. The default is `128`. |
+| `num_heads` | int, optional | Number of attention heads, which must divide :attr:`d_model`. The default is `8`. |
+| `drop_prob` | float, optional | Dropout probability. The default is `0.1`. |
+| `no_inter_attn` | bool, optional | If `True`, disables inter-granularity attention. The default is `False`. |
+| `num_layers` | int, optional | Number of encoder layers. The default is `6`. |
+| `dim_feedforward` | int, optional | Feedforward dimensionality. The default is `256`. |
+| `activation_trans` | nn.Module, optional | Activation module used in transformer encoder layers. The default is :class:`nn.ReLU`. |
+| `single_channel` | bool, optional | If `True`, processes each channel independently, increasing capacity and cost. The default is `False`. |
+| `output_attention` | bool, optional | If `True`, returns attention weights for interpretability. The default is `True`. |
+| `activation_class` | nn.Module, optional | Activation used in the final classification layer. The default is :class:`nn.GELU`. |
+## References
+1. Wang, Y., Huang, N., Li, T., Yan, Y., & Zhang, X. (2024). Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, & C. Zhang (Eds.), Advances in Neural Information Processing Systems (Vol. 37, pp. 36314-36341). doi:10.52202/079017-1145.
 ## Citation
+Cite the original architecture paper (see *References* above) and braindecode:
 ```bibtex
 @article{aristimunha2025braindecode,