bruAristimunha commited on
Commit
92ddf09
·
verified ·
1 Parent(s): 6a787c6

Add architecture-only model card

Browse files
Files changed (1) hide show
  1. README.md +893 -0
README.md ADDED
@@ -0,0 +1,893 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: bsd-3-clause
3
+ library_name: braindecode
4
+ pipeline_tag: feature-extraction
5
+ tags:
6
+ - eeg
7
+ - biosignal
8
+ - pytorch
9
+ - neuroscience
10
+ - braindecode
11
+ - foundation-model
12
+ - convolutional
13
+ - transformer
14
+ ---
15
+
16
+ # MEDFormer
17
+
18
+ Medformer from Wang et al (2024) .
19
+
20
+ > **Architecture-only repository.** This repo documents the
21
+ > `braindecode.models.MEDFormer` class. **No pretrained weights are
22
+ > distributed here** — instantiate the model and train it on your own
23
+ > data, or fine-tune from a published foundation-model checkpoint
24
+ > separately.
25
+
26
+ ## Quick start
27
+
28
+ ```bash
29
+ pip install braindecode
30
+ ```
31
+
32
+ ```python
33
+ from braindecode.models import MEDFormer
34
+
35
+ model = MEDFormer(
36
+ n_chans=22,
37
+ sfreq=250,
38
+ input_window_seconds=4.0,
39
+ n_outputs=4,
40
+ )
41
+ ```
42
+
43
+ The signal-shape arguments above are example defaults — adjust them
44
+ to match your recording.
45
+
46
+ ## Documentation
47
+
48
+ - Full API reference (parameters, references, architecture figure):
49
+ <https://braindecode.org/stable/generated/braindecode.models.MEDFormer.html>
50
+ - Interactive browser with live instantiation:
51
+ <https://huggingface.co/spaces/braindecode/model-explorer>
52
+ - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/medformer.py#L20>
53
+
54
+ ## Architecture description
55
+
56
+ The block below is the rendered class docstring (parameters,
57
+ references, architecture figure where available).
58
+
59
+ <div class='bd-doc'><main>
60
+ <p>Medformer from Wang et al (2024) <a class="citation-reference" href="#medformer2024" id="citation-reference-1" role="doc-biblioref">[Medformer2024]</a>.</p>
61
+ <span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#d9534f;color:white;font-size:11px;font-weight:600;margin-right:4px;">Foundation Model</span><figure class="align-center">
62
+ <img alt="MEDFormer Architecture." src="https://raw.githubusercontent.com/DL4mHealth/Medformer/refs/heads/main/figs/medformer_architecture.png" />
63
+ <figcaption>
64
+ <p>a) Workflow. b) For the input sample <math xmlns="http://www.w3.org/1998/Math/MathML">
65
+ <msub>
66
+ <mi>x</mi>
67
+ <mtext>in</mtext>
68
+ </msub>
69
+ </math>, the authors apply <math xmlns="http://www.w3.org/1998/Math/MathML">
70
+ <mi>n</mi>
71
+ </math>
72
+ different patch lengths in parallel to create patched features <math xmlns="http://www.w3.org/1998/Math/MathML">
73
+ <msubsup>
74
+ <mi>x</mi>
75
+ <mi>p</mi>
76
+ <mrow>
77
+ <mo stretchy="false">(</mo>
78
+ <mi>i</mi>
79
+ <mo stretchy="false">)</mo>
80
+ </mrow>
81
+ </msubsup>
82
+ </math>, where <math xmlns="http://www.w3.org/1998/Math/MathML">
83
+ <mi>i</mi>
84
+ </math>
85
+ ranges from 1 to <math xmlns="http://www.w3.org/1998/Math/MathML">
86
+ <mi>n</mi>
87
+ </math>. Each patch length represents a different granularity. These patched
88
+ features are linearly transformed into <math xmlns="http://www.w3.org/1998/Math/MathML">
89
+ <msubsup>
90
+ <mi>x</mi>
91
+ <mi>e</mi>
92
+ <mrow>
93
+ <mo stretchy="false">(</mo>
94
+ <mi>i</mi>
95
+ <mo stretchy="false">)</mo>
96
+ </mrow>
97
+ </msubsup>
98
+ </math> and augmented into <math xmlns="http://www.w3.org/1998/Math/MathML">
99
+ <msup>
100
+ <munderover>
101
+ <mi>x</mi>
102
+ <mi>e</mi>
103
+ <mo accent="true">~</mo>
104
+ </munderover>
105
+ <mrow>
106
+ <mo stretchy="false">(</mo>
107
+ <mi>i</mi>
108
+ <mo stretchy="false">)</mo>
109
+ </mrow>
110
+ </msup>
111
+ </math>.
112
+ c) The final patch embedding <math xmlns="http://www.w3.org/1998/Math/MathML">
113
+ <msup>
114
+ <mi>x</mi>
115
+ <mrow>
116
+ <mo stretchy="false">(</mo>
117
+ <mi>i</mi>
118
+ <mo stretchy="false">)</mo>
119
+ </mrow>
120
+ </msup>
121
+ </math> fuses augmented <math xmlns="http://www.w3.org/1998/Math/MathML">
122
+ <msup>
123
+ <munderover>
124
+ <mi>x</mi>
125
+ <mi>e</mi>
126
+ <mo accent="true">~</mo>
127
+ </munderover>
128
+ <mrow>
129
+ <mo stretchy="false">(</mo>
130
+ <mi>i</mi>
131
+ <mo stretchy="false">)</mo>
132
+ </mrow>
133
+ </msup>
134
+ </math> with the
135
+ positional embedding <math xmlns="http://www.w3.org/1998/Math/MathML">
136
+ <msub>
137
+ <mi>W</mi>
138
+ <mtext>pos</mtext>
139
+ </msub>
140
+ </math> and the granularity embedding <math xmlns="http://www.w3.org/1998/Math/MathML">
141
+ <msubsup>
142
+ <mi>W</mi>
143
+ <mtext>gr</mtext>
144
+ <mrow>
145
+ <mo stretchy="false">(</mo>
146
+ <mi>i</mi>
147
+ <mo stretchy="false">)</mo>
148
+ </mrow>
149
+ </msubsup>
150
+ </math>.
151
+ Each granularity employs a router <math xmlns="http://www.w3.org/1998/Math/MathML">
152
+ <msup>
153
+ <mi>u</mi>
154
+ <mrow>
155
+ <mo stretchy="false">(</mo>
156
+ <mi>i</mi>
157
+ <mo stretchy="false">)</mo>
158
+ </mrow>
159
+ </msup>
160
+ </math> to capture aggregated information.
161
+ Intra-granularity attention focuses within individual granularities, and inter-granularity attention
162
+ leverages the routers to integrate information across granularities.</p>
163
+ </figcaption>
164
+ </figure>
165
+ <p>The <strong>MedFormer</strong> is a multi-granularity patching transformer tailored to medical
166
+ time-series (MedTS) classification, with an emphasis on EEG and ECG signals. It captures
167
+ local temporal dynamics, inter-channel correlations, and multi-scale temporal structure
168
+ through cross-channel patching, multi-granularity embeddings, and two-stage attention
169
+ <a class="citation-reference" href="#medformer2024" id="citation-reference-2" role="doc-biblioref">[Medformer2024]</a>.</p>
170
+ <p><strong>Architecture Overview</strong></p>
171
+ <p>MedFormer integrates three mechanisms to enhance representation learning <a class="citation-reference" href="#medformer2024" id="citation-reference-3" role="doc-biblioref">[Medformer2024]</a>:</p>
172
+ <ol class="arabic simple">
173
+ <li><p><strong>Cross-channel patching.</strong> Leverages inter-channel correlations by forming patches
174
+ across multiple channels and timestamps, capturing multi-timestamp and cross-channel
175
+ patterns.</p></li>
176
+ <li><p><strong>Multi-granularity embedding.</strong> Extracts features at different temporal scales from
177
+ :attr:`patch_len_list`, emulating frequency-band behavior without hand-crafted filters.</p></li>
178
+ <li><p><strong>Two-stage multi-granularity self-attention.</strong> Learns intra- and inter-granularity
179
+ correlations to fuse information across temporal scales.</p></li>
180
+ </ol>
181
+ <p><strong>Macro Components</strong></p>
182
+ <dl>
183
+ <dt><span class="docutils literal">MEDFormer.enc_embedding</span> (Embedding Layer)</dt>
184
+ <dd><p><strong>Operations.</strong> :class:`~braindecode.models.medformer._ListPatchEmbedding` implements
185
+ cross-channel multi-granularity patching. For each patch length <math xmlns="http://www.w3.org/1998/Math/MathML">
186
+ <msub>
187
+ <mi>L</mi>
188
+ <mi>i</mi>
189
+ </msub>
190
+ </math>, the input
191
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
192
+ <msub>
193
+ <mi>𝐱</mi>
194
+ <mtext>in</mtext>
195
+ </msub>
196
+ <mo>∈</mo>
197
+ <msup>
198
+ <mi>ℝ</mi>
199
+ <mrow>
200
+ <mi>T</mi>
201
+ <mo>×</mo>
202
+ <mi>C</mi>
203
+ </mrow>
204
+ </msup>
205
+ </math> is segmented into
206
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
207
+ <msub>
208
+ <mi>N</mi>
209
+ <mi>i</mi>
210
+ </msub>
211
+ </math> cross-channel non-overlapping patches
212
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
213
+ <msubsup>
214
+ <mi>𝐱</mi>
215
+ <mi>p</mi>
216
+ <mrow>
217
+ <mo stretchy="false">(</mo>
218
+ <mi>i</mi>
219
+ <mo stretchy="false">)</mo>
220
+ </mrow>
221
+ </msubsup>
222
+ <mo>∈</mo>
223
+ <msup>
224
+ <mi>ℝ</mi>
225
+ <mrow>
226
+ <msub>
227
+ <mi>N</mi>
228
+ <mi>i</mi>
229
+ </msub>
230
+ <mo>×</mo>
231
+ <mo stretchy="false">(</mo>
232
+ <msub>
233
+ <mi>L</mi>
234
+ <mi>i</mi>
235
+ </msub>
236
+ <mo>⋅</mo>
237
+ <mi>C</mi>
238
+ <mo stretchy="false">)</mo>
239
+ </mrow>
240
+ </msup>
241
+ </math>, where
242
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
243
+ <msub>
244
+ <mi>N</mi>
245
+ <mi>i</mi>
246
+ </msub>
247
+ <mo>=</mo>
248
+ <mo>⌈</mo>
249
+ <mi>T</mi>
250
+ <mo stretchy="false">/</mo>
251
+ <msub>
252
+ <mi>L</mi>
253
+ <mi>i</mi>
254
+ </msub>
255
+ <mo>⌉</mo>
256
+ </math>. Each patch is linearly projected via
257
+ :class:`~braindecode.models.medformer._CrossChannelTokenEmbedding` to obtain
258
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
259
+ <msubsup>
260
+ <mi>𝐱</mi>
261
+ <mi>e</mi>
262
+ <mrow>
263
+ <mo stretchy="false">(</mo>
264
+ <mi>i</mi>
265
+ <mo stretchy="false">)</mo>
266
+ </mrow>
267
+ </msubsup>
268
+ <mo>∈</mo>
269
+ <msup>
270
+ <mi>ℝ</mi>
271
+ <mrow>
272
+ <msub>
273
+ <mi>N</mi>
274
+ <mi>i</mi>
275
+ </msub>
276
+ <mo>×</mo>
277
+ <mi>D</mi>
278
+ </mrow>
279
+ </msup>
280
+ </math>. Data augmentations
281
+ (masking, jittering) produce augmented embeddings <math xmlns="http://www.w3.org/1998/Math/MathML">
282
+ <msup>
283
+ <munderover>
284
+ <mi>𝐱</mi>
285
+ <mi>e</mi>
286
+ <mo stretchy="false">~</mo>
287
+ </munderover>
288
+ <mrow>
289
+ <mo stretchy="false">(</mo>
290
+ <mi>i</mi>
291
+ <mo stretchy="false">)</mo>
292
+ </mrow>
293
+ </msup>
294
+ </math>.
295
+ The final embedding combines augmented patches, fixed positional embeddings
296
+ (:class:`~braindecode.models.medformer._PositionalEmbedding`), and learnable
297
+ granularity embeddings <math xmlns="http://www.w3.org/1998/Math/MathML">
298
+ <msubsup>
299
+ <mi>𝐖</mi>
300
+ <mtext>gr</mtext>
301
+ <mrow>
302
+ <mo stretchy="false">(</mo>
303
+ <mi>i</mi>
304
+ <mo stretchy="false">)</mo>
305
+ </mrow>
306
+ </msubsup>
307
+ </math>:</p>
308
+ <div>
309
+ <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
310
+ <msup>
311
+ <mi>𝐱</mi>
312
+ <mrow>
313
+ <mo stretchy="false">(</mo>
314
+ <mi>i</mi>
315
+ <mo stretchy="false">)</mo>
316
+ </mrow>
317
+ </msup>
318
+ <mo>=</mo>
319
+ <msup>
320
+ <munderover>
321
+ <mi>𝐱</mi>
322
+ <mi>e</mi>
323
+ <mo stretchy="false">~</mo>
324
+ </munderover>
325
+ <mrow>
326
+ <mo stretchy="false">(</mo>
327
+ <mi>i</mi>
328
+ <mo stretchy="false">)</mo>
329
+ </mrow>
330
+ </msup>
331
+ <mo>+</mo>
332
+ <msub>
333
+ <mi>𝐖</mi>
334
+ <mtext>pos</mtext>
335
+ </msub>
336
+ <mo stretchy="false">[</mo>
337
+ <mn>1</mn>
338
+ <mo>∶</mo>
339
+ <msub>
340
+ <mi>N</mi>
341
+ <mi>i</mi>
342
+ </msub>
343
+ <mo stretchy="false">]</mo>
344
+ <mo>+</mo>
345
+ <msubsup>
346
+ <mi>𝐖</mi>
347
+ <mtext>gr</mtext>
348
+ <mrow>
349
+ <mo stretchy="false">(</mo>
350
+ <mi>i</mi>
351
+ <mo stretchy="false">)</mo>
352
+ </mrow>
353
+ </msubsup>
354
+ </math>
355
+ </div>
356
+ <p>Additionally, a router token is initialized for each granularity:</p>
357
+ <div>
358
+ <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
359
+ <msup>
360
+ <mi>𝐮</mi>
361
+ <mrow>
362
+ <mo stretchy="false">(</mo>
363
+ <mi>i</mi>
364
+ <mo stretchy="false">)</mo>
365
+ </mrow>
366
+ </msup>
367
+ <mo>=</mo>
368
+ <msub>
369
+ <mi>𝐖</mi>
370
+ <mtext>pos</mtext>
371
+ </msub>
372
+ <mo stretchy="false">[</mo>
373
+ <msub>
374
+ <mi>N</mi>
375
+ <mi>i</mi>
376
+ </msub>
377
+ <mo>+</mo>
378
+ <mn>1</mn>
379
+ <mo stretchy="false">]</mo>
380
+ <mo>+</mo>
381
+ <msubsup>
382
+ <mi>𝐖</mi>
383
+ <mtext>gr</mtext>
384
+ <mrow>
385
+ <mo stretchy="false">(</mo>
386
+ <mi>i</mi>
387
+ <mo stretchy="false">)</mo>
388
+ </mrow>
389
+ </msubsup>
390
+ </math>
391
+ </div>
392
+ <p><strong>Role.</strong> Converts raw input into granularity-specific patch embeddings
393
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
394
+ <mo>{</mo>
395
+ <msup>
396
+ <mi>𝐱</mi>
397
+ <mrow>
398
+ <mo stretchy="false">(</mo>
399
+ <mn>1</mn>
400
+ <mo stretchy="false">)</mo>
401
+ </mrow>
402
+ </msup>
403
+ <mo>,</mo>
404
+ <mi>…</mi>
405
+ <mo>,</mo>
406
+ <msup>
407
+ <mi>𝐱</mi>
408
+ <mrow>
409
+ <mo stretchy="false">(</mo>
410
+ <mi>n</mi>
411
+ <mo stretchy="false">)</mo>
412
+ </mrow>
413
+ </msup>
414
+ <mo>}</mo>
415
+ </math> and router embeddings
416
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
417
+ <mo>{</mo>
418
+ <msup>
419
+ <mi>𝐮</mi>
420
+ <mrow>
421
+ <mo stretchy="false">(</mo>
422
+ <mn>1</mn>
423
+ <mo stretchy="false">)</mo>
424
+ </mrow>
425
+ </msup>
426
+ <mo>,</mo>
427
+ <mi>…</mi>
428
+ <mo>,</mo>
429
+ <msup>
430
+ <mi>𝐮</mi>
431
+ <mrow>
432
+ <mo stretchy="false">(</mo>
433
+ <mi>n</mi>
434
+ <mo stretchy="false">)</mo>
435
+ </mrow>
436
+ </msup>
437
+ <mo>}</mo>
438
+ </math> for multi-scale processing.</p>
439
+ </dd>
440
+ <dt><span class="docutils literal">MEDFormer.encoder</span> (Transformer Encoder Stack)</dt>
441
+ <dd><p><strong>Operations.</strong> A stack of :class:`~braindecode.models.medformer._EncoderLayer` modules,
442
+ each containing a :class:`~braindecode.models.medformer._MedformerLayer` that implements
443
+ two-stage self-attention. The two-stage mechanism splits self-attention into:</p>
444
+ <p><strong>(a) Intra-Granularity Self-Attention.</strong> For granularity <math xmlns="http://www.w3.org/1998/Math/MathML">
445
+ <mi>i</mi>
446
+ </math>, the patch embedding
447
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
448
+ <msup>
449
+ <mi>𝐱</mi>
450
+ <mrow>
451
+ <mo stretchy="false">(</mo>
452
+ <mi>i</mi>
453
+ <mo stretchy="false">)</mo>
454
+ </mrow>
455
+ </msup>
456
+ <mo>∈</mo>
457
+ <msup>
458
+ <mi>ℝ</mi>
459
+ <mrow>
460
+ <msub>
461
+ <mi>N</mi>
462
+ <mi>i</mi>
463
+ </msub>
464
+ <mo>×</mo>
465
+ <mi>D</mi>
466
+ </mrow>
467
+ </msup>
468
+ </math> and router embedding
469
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
470
+ <msup>
471
+ <mi>𝐮</mi>
472
+ <mrow>
473
+ <mo stretchy="false">(</mo>
474
+ <mi>i</mi>
475
+ <mo stretchy="false">)</mo>
476
+ </mrow>
477
+ </msup>
478
+ <mo>∈</mo>
479
+ <msup>
480
+ <mi>ℝ</mi>
481
+ <mrow>
482
+ <mn>1</mn>
483
+ <mo>×</mo>
484
+ <mi>D</mi>
485
+ </mrow>
486
+ </msup>
487
+ </math> are concatenated:</p>
488
+ <div>
489
+ <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
490
+ <msup>
491
+ <mi>𝐳</mi>
492
+ <mrow>
493
+ <mo stretchy="false">(</mo>
494
+ <mi>i</mi>
495
+ <mo stretchy="false">)</mo>
496
+ </mrow>
497
+ </msup>
498
+ <mo>=</mo>
499
+ <mo stretchy="false">[</mo>
500
+ <msup>
501
+ <mi>𝐱</mi>
502
+ <mrow>
503
+ <mo stretchy="false">(</mo>
504
+ <mi>i</mi>
505
+ <mo stretchy="false">)</mo>
506
+ </mrow>
507
+ </msup>
508
+ <mo>‖</mo>
509
+ <msup>
510
+ <mi>𝐮</mi>
511
+ <mrow>
512
+ <mo stretchy="false">(</mo>
513
+ <mi>i</mi>
514
+ <mo stretchy="false">)</mo>
515
+ </mrow>
516
+ </msup>
517
+ <mo stretchy="false">]</mo>
518
+ <mo>∈</mo>
519
+ <msup>
520
+ <mi>ℝ</mi>
521
+ <mrow>
522
+ <mo stretchy="false">(</mo>
523
+ <msub>
524
+ <mi>N</mi>
525
+ <mi>i</mi>
526
+ </msub>
527
+ <mo>+</mo>
528
+ <mn>1</mn>
529
+ <mo stretchy="false">)</mo>
530
+ <mo>×</mo>
531
+ <mi>D</mi>
532
+ </mrow>
533
+ </msup>
534
+ </math>
535
+ </div>
536
+ <p>Self-attention is applied to update both embeddings:</p>
537
+ <div>
538
+ <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
539
+ <mtable class="ams-align" displaystyle="true">
540
+ <mtr>
541
+ <mtd>
542
+ <msup>
543
+ <mi>𝐱</mi>
544
+ <mrow>
545
+ <mo stretchy="false">(</mo>
546
+ <mi>i</mi>
547
+ <mo stretchy="false">)</mo>
548
+ </mrow>
549
+ </msup>
550
+ </mtd>
551
+ <mtd>
552
+ <mo>←</mo>
553
+ <msub>
554
+ <mtext>Attn</mtext>
555
+ <mtext>intra</mtext>
556
+ </msub>
557
+ <mo stretchy="false">(</mo>
558
+ <msup>
559
+ <mi>𝐱</mi>
560
+ <mrow>
561
+ <mo stretchy="false">(</mo>
562
+ <mi>i</mi>
563
+ <mo stretchy="false">)</mo>
564
+ </mrow>
565
+ </msup>
566
+ <mo>,</mo>
567
+ <msup>
568
+ <mi>𝐳</mi>
569
+ <mrow>
570
+ <mo stretchy="false">(</mo>
571
+ <mi>i</mi>
572
+ <mo stretchy="false">)</mo>
573
+ </mrow>
574
+ </msup>
575
+ <mo>,</mo>
576
+ <msup>
577
+ <mi>𝐳</mi>
578
+ <mrow>
579
+ <mo stretchy="false">(</mo>
580
+ <mi>i</mi>
581
+ <mo stretchy="false">)</mo>
582
+ </mrow>
583
+ </msup>
584
+ <mo stretchy="false">)</mo>
585
+ </mtd>
586
+ </mtr>
587
+ <mtr>
588
+ <mtd>
589
+ <msup>
590
+ <mi>𝐮</mi>
591
+ <mrow>
592
+ <mo stretchy="false">(</mo>
593
+ <mi>i</mi>
594
+ <mo stretchy="false">)</mo>
595
+ </mrow>
596
+ </msup>
597
+ </mtd>
598
+ <mtd>
599
+ <mo>←</mo>
600
+ <msub>
601
+ <mtext>Attn</mtext>
602
+ <mtext>intra</mtext>
603
+ </msub>
604
+ <mo stretchy="false">(</mo>
605
+ <msup>
606
+ <mi>𝐮</mi>
607
+ <mrow>
608
+ <mo stretchy="false">(</mo>
609
+ <mi>i</mi>
610
+ <mo stretchy="false">)</mo>
611
+ </mrow>
612
+ </msup>
613
+ <mo>,</mo>
614
+ <msup>
615
+ <mi>𝐳</mi>
616
+ <mrow>
617
+ <mo stretchy="false">(</mo>
618
+ <mi>i</mi>
619
+ <mo stretchy="false">)</mo>
620
+ </mrow>
621
+ </msup>
622
+ <mo>,</mo>
623
+ <msup>
624
+ <mi>𝐳</mi>
625
+ <mrow>
626
+ <mo stretchy="false">(</mo>
627
+ <mi>i</mi>
628
+ <mo stretchy="false">)</mo>
629
+ </mrow>
630
+ </msup>
631
+ <mo stretchy="false">)</mo>
632
+ </mtd>
633
+ </mtr>
634
+ </mtable>
635
+ </math>
636
+ </div>
637
+ <p>This captures temporal features within each granularity independently.</p>
638
+ <p><strong>(b) Inter-Granularity Self-Attention.</strong> All router embeddings are concatenated:</p>
639
+ <div>
640
+ <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
641
+ <mi>𝐔</mi>
642
+ <mo>=</mo>
643
+ <mo stretchy="false">[</mo>
644
+ <msup>
645
+ <mi>𝐮</mi>
646
+ <mrow>
647
+ <mo stretchy="false">(</mo>
648
+ <mn>1</mn>
649
+ <mo stretchy="false">)</mo>
650
+ </mrow>
651
+ </msup>
652
+ <mo>‖</mo>
653
+ <msup>
654
+ <mi>𝐮</mi>
655
+ <mrow>
656
+ <mo stretchy="false">(</mo>
657
+ <mn>2</mn>
658
+ <mo stretchy="false">)</mo>
659
+ </mrow>
660
+ </msup>
661
+ <mo>‖</mo>
662
+ <mi>⋯</mi>
663
+ <mo>‖</mo>
664
+ <msup>
665
+ <mi>𝐮</mi>
666
+ <mrow>
667
+ <mo stretchy="false">(</mo>
668
+ <mi>n</mi>
669
+ <mo stretchy="false">)</mo>
670
+ </mrow>
671
+ </msup>
672
+ <mo stretchy="false">]</mo>
673
+ <mo>∈</mo>
674
+ <msup>
675
+ <mi>ℝ</mi>
676
+ <mrow>
677
+ <mi>n</mi>
678
+ <mo>×</mo>
679
+ <mi>D</mi>
680
+ </mrow>
681
+ </msup>
682
+ </math>
683
+ </div>
684
+ <p>Self-attention among routers exchanges information across granularities:</p>
685
+ <div>
686
+ <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
687
+ <msup>
688
+ <mi>𝐮</mi>
689
+ <mrow>
690
+ <mo stretchy="false">(</mo>
691
+ <mi>i</mi>
692
+ <mo stretchy="false">)</mo>
693
+ </mrow>
694
+ </msup>
695
+ <mo>←</mo>
696
+ <msub>
697
+ <mtext>Attn</mtext>
698
+ <mtext>inter</mtext>
699
+ </msub>
700
+ <mo stretchy="false">(</mo>
701
+ <msup>
702
+ <mi>𝐮</mi>
703
+ <mrow>
704
+ <mo stretchy="false">(</mo>
705
+ <mi>i</mi>
706
+ <mo stretchy="false">)</mo>
707
+ </mrow>
708
+ </msup>
709
+ <mo>,</mo>
710
+ <mi>𝐔</mi>
711
+ <mo>,</mo>
712
+ <mi>𝐔</mi>
713
+ <mo stretchy="false">)</mo>
714
+ </math>
715
+ </div>
716
+ <p><strong>Role.</strong> Learns representations and correlations within and across temporal scales while
717
+ reducing complexity from <math xmlns="http://www.w3.org/1998/Math/MathML">
718
+ <mi>O</mi>
719
+ <mo stretchy="false">(</mo>
720
+ <mo stretchy="false">(</mo>
721
+ <munder>
722
+ <mo movablelimits="true">∑</mo>
723
+ <mi>i</mi>
724
+ </munder>
725
+ <msub>
726
+ <mi>N</mi>
727
+ <mi>i</mi>
728
+ </msub>
729
+ <msup>
730
+ <mo stretchy="false">)</mo>
731
+ <mn>2</mn>
732
+ </msup>
733
+ <mo stretchy="false">)</mo>
734
+ </math> to
735
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
736
+ <mi>O</mi>
737
+ <mo stretchy="false">(</mo>
738
+ <munder>
739
+ <mo movablelimits="true">∑</mo>
740
+ <mi>i</mi>
741
+ </munder>
742
+ <msubsup>
743
+ <mi>N</mi>
744
+ <mi>i</mi>
745
+ <mn>2</mn>
746
+ </msubsup>
747
+ <mo>+</mo>
748
+ <msup>
749
+ <mi>n</mi>
750
+ <mn>2</mn>
751
+ </msup>
752
+ <mo stretchy="false">)</mo>
753
+ </math> through the router mechanism.</p>
754
+ </dd>
755
+ </dl>
756
+ <p><strong>Temporal, Spatial, and Spectral Encoding</strong></p>
757
+ <ul class="simple">
758
+ <li><p><strong>Temporal:</strong> Multiple patch lengths in :attr:`patch_len_list` capture features at several
759
+ temporal granularities, while intra-granularity attention supports long-range temporal
760
+ dependencies.</p></li>
761
+ <li><p><strong>Spatial:</strong> Cross-channel patching embeds inter-channel dependencies by applying kernels
762
+ that span every input channel.</p></li>
763
+ <li><p><strong>Spectral:</strong> Differing patch lengths simulate multiple sampling frequencies analogous to
764
+ clinically relevant bands (e.g., alpha, beta, gamma).</p></li>
765
+ </ul>
766
+ <p><strong>Additional Mechanisms</strong></p>
767
+ <ul class="simple">
768
+ <li><p><strong>Granularity router:</strong> Each granularity <math xmlns="http://www.w3.org/1998/Math/MathML">
769
+ <mi>i</mi>
770
+ </math> receives a dedicated router token
771
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
772
+ <msup>
773
+ <mi>𝐮</mi>
774
+ <mrow>
775
+ <mo stretchy="false">(</mo>
776
+ <mi>i</mi>
777
+ <mo stretchy="false">)</mo>
778
+ </mrow>
779
+ </msup>
780
+ </math>. Intra-attention updates the token, and inter-attention exchanges
781
+ aggregated information across scales.</p></li>
782
+ <li><p><strong>Complexity:</strong> Router-mediated two-stage attention maintains <math xmlns="http://www.w3.org/1998/Math/MathML">
783
+ <mi>O</mi>
784
+ <mo stretchy="false">(</mo>
785
+ <msup>
786
+ <mi>T</mi>
787
+ <mn>2</mn>
788
+ </msup>
789
+ <mo stretchy="false">)</mo>
790
+ </math> complexity for
791
+ suitable patch lengths (e.g., power series), preserving transformer-like efficiency while
792
+ modeling multiple granularities.</p></li>
793
+ </ul>
794
+ <section id="parameters">
795
+ <h2>Parameters</h2>
796
+ <dl class="simple">
797
+ <dt>patch_len_list<span class="classifier">list of int, optional</span></dt>
798
+ <dd><p>Patch lengths for multi-granularity patching; each entry selects a temporal scale.
799
+ The default is <span class="docutils literal">[14, 44, 45]</span>.</p>
800
+ </dd>
801
+ <dt>embed_dim<span class="classifier">int, optional</span></dt>
802
+ <dd><p>Embedding dimensionality. The default is <span class="docutils literal">128</span>.</p>
803
+ </dd>
804
+ <dt>num_heads<span class="classifier">int, optional</span></dt>
805
+ <dd><p>Number of attention heads, which must divide :attr:`d_model`. The default is <span class="docutils literal">8</span>.</p>
806
+ </dd>
807
+ <dt>drop_prob<span class="classifier">float, optional</span></dt>
808
+ <dd><p>Dropout probability. The default is <span class="docutils literal">0.1</span>.</p>
809
+ </dd>
810
+ <dt>no_inter_attn<span class="classifier">bool, optional</span></dt>
811
+ <dd><p>If <span class="docutils literal">True</span>, disables inter-granularity attention. The default is <span class="docutils literal">False</span>.</p>
812
+ </dd>
813
+ <dt>num_layers<span class="classifier">int, optional</span></dt>
814
+ <dd><p>Number of encoder layers. The default is <span class="docutils literal">6</span>.</p>
815
+ </dd>
816
+ <dt>dim_feedforward<span class="classifier">int, optional</span></dt>
817
+ <dd><p>Feedforward dimensionality. The default is <span class="docutils literal">256</span>.</p>
818
+ </dd>
819
+ <dt>activation_trans<span class="classifier">nn.Module, optional</span></dt>
820
+ <dd><p>Activation module used in transformer encoder layers. The default is :class:`nn.ReLU`.</p>
821
+ </dd>
822
+ <dt>single_channel<span class="classifier">bool, optional</span></dt>
823
+ <dd><p>If <span class="docutils literal">True</span>, processes each channel independently, increasing capacity and cost. The default is <span class="docutils literal">False</span>.</p>
824
+ </dd>
825
+ <dt>output_attention<span class="classifier">bool, optional</span></dt>
826
+ <dd><p>If <span class="docutils literal">True</span>, returns attention weights for interpretability. The default is <span class="docutils literal">True</span>.</p>
827
+ </dd>
828
+ <dt>activation_class<span class="classifier">nn.Module, optional</span></dt>
829
+ <dd><p>Activation used in the final classification layer. The default is :class:`nn.GELU`.</p>
830
+ </dd>
831
+ </dl>
832
+ </section>
833
+ <section id="notes">
834
+ <h2>Notes</h2>
835
+ <ul class="simple">
836
+ <li><p>MedFormer outperforms strong baselines across six metrics on five MedTS datasets in a
837
+ subject-independent evaluation <a class="citation-reference" href="#medformer2024" id="citation-reference-4" role="doc-biblioref">[Medformer2024]</a>.</p></li>
838
+ <li><p>Cross-channel patching provides the largest F1 improvement in ablation studies (average
839
+ +6.10%), highlighting its importance for MedTS tasks <a class="citation-reference" href="#medformer2024" id="citation-reference-5" role="doc-biblioref">[Medformer2024]</a>.</p></li>
840
+ <li><p>Setting :attr:`no_inter_attn` to <span class="docutils literal">True</span> disables inter-granularity attention while retaining
841
+ intra-granularity attention.</p></li>
842
+ </ul>
843
+ </section>
844
+ <section id="references">
845
+ <h2>References</h2>
846
+ <div role="list" class="citation-list">
847
+ <div class="citation" id="medformer2024" role="doc-biblioentry">
848
+ <span class="label"><span class="fn-bracket">[</span>Medformer2024<span class="fn-bracket">]</span></span>
849
+ <span class="backrefs">(<a role="doc-backlink" href="#citation-reference-1">1</a>,<a role="doc-backlink" href="#citation-reference-2">2</a>,<a role="doc-backlink" href="#citation-reference-3">3</a>,<a role="doc-backlink" href="#citation-reference-4">4</a>,<a role="doc-backlink" href="#citation-reference-5">5</a>)</span>
850
+ <p>Wang, Y., Huang, N., Li, T., Yan, Y., &amp; Zhang, X. (2024).
851
+ Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification.
852
+ In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, &amp; C. Zhang (Eds.),
853
+ Advances in Neural Information Processing Systems (Vol. 37, pp. 36314-36341).
854
+ doi:10.52202/079017-1145.</p>
855
+ </div>
856
+ </div>
857
+ <p><strong>Hugging Face Hub integration</strong></p>
858
+ <p>When the optional <span class="docutils literal">huggingface_hub</span> package is installed, all models
859
+ automatically gain the ability to be pushed to and loaded from the
860
+ Hugging Face Hub. Install with:</p>
861
+ <pre class="literal-block">pip install braindecode[hub]</pre>
862
+ <p><strong>Pushing a model to the Hub:</strong></p>
863
+ <p><strong>Loading a model from the Hub:</strong></p>
864
+ <p><strong>Extracting features and replacing the head:</strong></p>
865
+ <p><strong>Saving and restoring full configuration:</strong></p>
866
+ <p>All model parameters (both EEG-specific and model-specific such as
867
+ dropout rates, activation functions, number of filters) are automatically
868
+ saved to the Hub and restored when loading.</p>
869
+ <p>See :ref:`load-pretrained-models` for a complete tutorial.</p>
870
+ </section>
871
+ </main>
872
+ </div>
873
+
874
+ ## Citation
875
+
876
+ Please cite both the original paper for this architecture (see the
877
+ *References* section above) and braindecode:
878
+
879
+ ```bibtex
880
+ @article{aristimunha2025braindecode,
881
+ title = {Braindecode: a deep learning library for raw electrophysiological data},
882
+ author = {Aristimunha, Bruno and others},
883
+ journal = {Zenodo},
884
+ year = {2025},
885
+ doi = {10.5281/zenodo.17699192},
886
+ }
887
+ ```
888
+
889
+ ## License
890
+
891
+ BSD-3-Clause for the model code (matching braindecode).
892
+ Pretraining-derived weights, if you fine-tune from a checkpoint,
893
+ inherit the licence of that checkpoint and its training corpus.