bruAristimunha commited on
Commit
62bf293
·
verified ·
1 Parent(s): 769dde1

Add architecture-only model card

Browse files
Files changed (1) hide show
  1. README.md +537 -0
README.md ADDED
@@ -0,0 +1,537 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: bsd-3-clause
3
+ library_name: braindecode
4
+ pipeline_tag: feature-extraction
5
+ tags:
6
+ - eeg
7
+ - biosignal
8
+ - pytorch
9
+ - neuroscience
10
+ - braindecode
11
+ - foundation-model
12
+ - transformer
13
+ ---
14
+
15
+ # EEGPT
16
+
17
+ EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals from Wang et al. (2024) .
18
+
19
+ > **Architecture-only repository.** This repo documents the
20
+ > `braindecode.models.EEGPT` class. **No pretrained weights are
21
+ > distributed here** — instantiate the model and train it on your own
22
+ > data, or fine-tune from a published foundation-model checkpoint
23
+ > separately.
24
+
25
+ ## Quick start
26
+
27
+ ```bash
28
+ pip install braindecode
29
+ ```
30
+
31
+ ```python
32
+ from braindecode.models import EEGPT
33
+
34
+ model = EEGPT(
35
+ n_chans=22,
36
+ sfreq=200,
37
+ input_window_seconds=4.0,
38
+ n_outputs=2,
39
+ )
40
+ ```
41
+
42
+ The signal-shape arguments above are example defaults — adjust them
43
+ to match your recording.
44
+
45
+ ## Documentation
46
+
47
+ - Full API reference (parameters, references, architecture figure):
48
+ <https://braindecode.org/stable/generated/braindecode.models.EEGPT.html>
49
+ - Interactive browser with live instantiation:
50
+ <https://huggingface.co/spaces/braindecode/model-explorer>
51
+ - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/eegpt.py#L21>
52
+
53
+ ## Architecture description
54
+
55
+ The block below is the rendered class docstring (parameters,
56
+ references, architecture figure where available).
57
+
58
+ <div class='bd-doc'><main>
59
+ <p>EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals from Wang et al. (2024) <a class="citation-reference" href="#eegpt" id="citation-reference-1" role="doc-biblioref">[eegpt]</a>.</p>
60
+ <span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#d9534f;color:white;font-size:11px;font-weight:600;margin-right:4px;">Foundation Model</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#56B4E9;color:white;font-size:11px;font-weight:600;margin-right:4px;">Attention/Transformer</span><figure class="align-center">
61
+ <img alt="EEGPT Architecture" src="https://github.com/BINE022/EEGPT/raw/main/figures/EEGPT.jpg" style="width: 1000px;" />
62
+ <figcaption>
63
+ <p>a) The EEGPT structure involves patching the input EEG signal as <math xmlns="http://www.w3.org/1998/Math/MathML">
64
+ <msub>
65
+ <mi>p</mi>
66
+ <mrow>
67
+ <mi>i</mi>
68
+ <mo>,</mo>
69
+ <mi>j</mi>
70
+ </mrow>
71
+ </msub>
72
+ </math> through masking
73
+ (50% time and 80% channel patches), creating masked part <math xmlns="http://www.w3.org/1998/Math/MathML">
74
+ <mi>ℳ</mi>
75
+ </math> and unmasked part <math xmlns="http://www.w3.org/1998/Math/MathML">
76
+ <mover accent="true">
77
+ <mi>ℳ</mi>
78
+ <mo stretchy="false">ˉ</mo>
79
+ </mover>
80
+ </math>.
81
+ b) Local spatio-temporal embedding maps patches to tokens.
82
+ c) Use of dual self-supervised learning with Spatio-Temporal Representation Alignment and Mask-based Reconstruction.</p>
83
+ </figcaption>
84
+ </figure>
85
+ <p><strong>EEGPT</strong> is a pretrained transformer model designed for universal EEG feature extraction.
86
+ It addresses challenges like low SNR and inter-subject variability by employing
87
+ a dual self-supervised learning method that combines <strong>Spatio-Temporal Representation Alignment</strong>
88
+ and <strong>Mask-based Reconstruction</strong> <a class="citation-reference" href="#eegpt" id="citation-reference-2" role="doc-biblioref">[eegpt]</a>.</p>
89
+ <p><strong>Model Overview (Layer-by-layer)</strong></p>
90
+ <ol class="arabic simple">
91
+ <li><p><strong>Patch embedding</strong> (<span class="docutils literal">_PatchEmbed</span> or <span class="docutils literal">_PatchNormEmbed</span>): split each channel into
92
+ <span class="docutils literal">patch_size</span> time patches and project to <span class="docutils literal">embed_dim</span>, yielding tokens with shape
93
+ <span class="docutils literal">(batch, n_patches, n_chans, embed_dim)</span>.</p></li>
94
+ <li><p><strong>Channel embedding</strong> (<span class="docutils literal">chan_embed</span>): add a learned embedding for each channel to preserve
95
+ spatial identity before attention.</p></li>
96
+ <li><p><strong>Transformer encoder blocks</strong> (<span class="docutils literal">_EEGTransformer.blocks</span>): for each patch group, append
97
+ <span class="docutils literal">embed_num</span> learned summary tokens and process the sequence with multi-head self-attention
98
+ and MLP layers.</p></li>
99
+ <li><p><strong>Summary extraction</strong>: keep only the summary tokens, apply <span class="docutils literal">norm</span> if set, and reshape back
100
+ to <span class="docutils literal">(batch, n_patches, embed_num, embed_dim)</span>.</p></li>
101
+ <li><p><strong>Task head</strong> (<span class="docutils literal">final_layer</span>): flatten summary tokens across patches and map to
102
+ <span class="docutils literal">n_outputs</span>; if <span class="docutils literal">return_encoder_output=True</span>, return the encoder features instead.</p></li>
103
+ </ol>
104
+ <p><strong>Dual Self-Supervised Learning</strong></p>
105
+ <p>EEGPT moves beyond simple masked reconstruction by introducing a representation alignment objective.
106
+ The pretraining loss <math xmlns="http://www.w3.org/1998/Math/MathML">
107
+ <mi>ℒ</mi>
108
+ </math> is the sum of alignment loss <math xmlns="http://www.w3.org/1998/Math/MathML">
109
+ <msub>
110
+ <mi>ℒ</mi>
111
+ <mi>A</mi>
112
+ </msub>
113
+ </math> and reconstruction loss <math xmlns="http://www.w3.org/1998/Math/MathML">
114
+ <msub>
115
+ <mi>ℒ</mi>
116
+ <mi>R</mi>
117
+ </msub>
118
+ </math>:</p>
119
+ <div>
120
+ <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
121
+ <mi>ℒ</mi>
122
+ <mo>=</mo>
123
+ <msub>
124
+ <mi>ℒ</mi>
125
+ <mi>A</mi>
126
+ </msub>
127
+ <mo>+</mo>
128
+ <msub>
129
+ <mi>ℒ</mi>
130
+ <mi>R</mi>
131
+ </msub>
132
+ </math>
133
+ </div>
134
+ <ol class="arabic">
135
+ <li><p><strong>Spatio-Temporal Representation Alignment:</strong> (<math xmlns="http://www.w3.org/1998/Math/MathML">
136
+ <msub>
137
+ <mi>ℒ</mi>
138
+ <mi>A</mi>
139
+ </msub>
140
+ </math>)
141
+ Aligns the predicted features of masked regions with global features extracted by a Momentum Encoder.
142
+ This forces the model to learn semantic, high-level representations rather than just signal waveform details.</p>
143
+ <div>
144
+ <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
145
+ <msub>
146
+ <mi>ℒ</mi>
147
+ <mi>A</mi>
148
+ </msub>
149
+ <mo>=</mo>
150
+ <mo form="prefix">−</mo>
151
+ <mfrac>
152
+ <mn>1</mn>
153
+ <mi>N</mi>
154
+ </mfrac>
155
+ <munderover>
156
+ <mo movablelimits="true">∑</mo>
157
+ <mrow>
158
+ <mi>j</mi>
159
+ <mo>=</mo>
160
+ <mn>1</mn>
161
+ </mrow>
162
+ <mi>N</mi>
163
+ </munderover>
164
+ <mo stretchy="false">|</mo>
165
+ <mo stretchy="false">|</mo>
166
+ <mi>p</mi>
167
+ <mi>r</mi>
168
+ <mi>e</mi>
169
+ <msub>
170
+ <mi>d</mi>
171
+ <mi>j</mi>
172
+ </msub>
173
+ <mo>−</mo>
174
+ <mi>L</mi>
175
+ <mi>N</mi>
176
+ <mo stretchy="false">(</mo>
177
+ <mi>m</mi>
178
+ <mi>e</mi>
179
+ <mi>n</mi>
180
+ <msub>
181
+ <mi>c</mi>
182
+ <mi>j</mi>
183
+ </msub>
184
+ <mo stretchy="false">)</mo>
185
+ <mo stretchy="false">|</mo>
186
+ <msubsup>
187
+ <mo stretchy="false">|</mo>
188
+ <mn>2</mn>
189
+ <mn>2</mn>
190
+ </msubsup>
191
+ </math>
192
+ </div>
193
+ <p>where <math xmlns="http://www.w3.org/1998/Math/MathML">
194
+ <mi>p</mi>
195
+ <mi>r</mi>
196
+ <mi>e</mi>
197
+ <msub>
198
+ <mi>d</mi>
199
+ <mi>j</mi>
200
+ </msub>
201
+ </math> is the predictor output and <math xmlns="http://www.w3.org/1998/Math/MathML">
202
+ <mi>m</mi>
203
+ <mi>e</mi>
204
+ <mi>n</mi>
205
+ <msub>
206
+ <mi>c</mi>
207
+ <mi>j</mi>
208
+ </msub>
209
+ </math> is the momentum encoder output.</p>
210
+ </li>
211
+ <li><p><strong>Mask-based Reconstruction:</strong> (<math xmlns="http://www.w3.org/1998/Math/MathML">
212
+ <msub>
213
+ <mi>ℒ</mi>
214
+ <mi>R</mi>
215
+ </msub>
216
+ </math>)
217
+ Standard masked autoencoder objective to reconstruct the raw EEG patches, ensuring local temporal fidelity.</p>
218
+ <div>
219
+ <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
220
+ <msub>
221
+ <mi>ℒ</mi>
222
+ <mi>R</mi>
223
+ </msub>
224
+ <mo>=</mo>
225
+ <mo form="prefix">−</mo>
226
+ <mfrac>
227
+ <mn>1</mn>
228
+ <mrow>
229
+ <mo stretchy="false">|</mo>
230
+ <mi>ℳ</mi>
231
+ <mo stretchy="false">|</mo>
232
+ </mrow>
233
+ </mfrac>
234
+ <munder>
235
+ <mo movablelimits="true">∑</mo>
236
+ <mrow>
237
+ <mo stretchy="false">(</mo>
238
+ <mi>i</mi>
239
+ <mo>,</mo>
240
+ <mi>j</mi>
241
+ <mo stretchy="false">)</mo>
242
+ <mo>∈</mo>
243
+ <mi>ℳ</mi>
244
+ </mrow>
245
+ </munder>
246
+ <mo stretchy="false">|</mo>
247
+ <mo stretchy="false">|</mo>
248
+ <mi>r</mi>
249
+ <mi>e</mi>
250
+ <msub>
251
+ <mi>c</mi>
252
+ <mrow>
253
+ <mi>i</mi>
254
+ <mo>,</mo>
255
+ <mi>j</mi>
256
+ </mrow>
257
+ </msub>
258
+ <mo>−</mo>
259
+ <mi>L</mi>
260
+ <mi>N</mi>
261
+ <mo stretchy="false">(</mo>
262
+ <msub>
263
+ <mi>p</mi>
264
+ <mrow>
265
+ <mi>i</mi>
266
+ <mo>,</mo>
267
+ <mi>j</mi>
268
+ </mrow>
269
+ </msub>
270
+ <mo stretchy="false">)</mo>
271
+ <mo stretchy="false">|</mo>
272
+ <msubsup>
273
+ <mo stretchy="false">|</mo>
274
+ <mn>2</mn>
275
+ <mn>2</mn>
276
+ </msubsup>
277
+ </math>
278
+ </div>
279
+ <p>where <math xmlns="http://www.w3.org/1998/Math/MathML">
280
+ <mi>r</mi>
281
+ <mi>e</mi>
282
+ <msub>
283
+ <mi>c</mi>
284
+ <mrow>
285
+ <mi>i</mi>
286
+ <mo>,</mo>
287
+ <mi>j</mi>
288
+ </mrow>
289
+ </msub>
290
+ </math> is the reconstructed patch and <math xmlns="http://www.w3.org/1998/Math/MathML">
291
+ <msub>
292
+ <mi>p</mi>
293
+ <mrow>
294
+ <mi>i</mi>
295
+ <mo>,</mo>
296
+ <mi>j</mi>
297
+ </mrow>
298
+ </msub>
299
+ </math> is the original patch.</p>
300
+ </li>
301
+ </ol>
302
+ <p><strong>Macro Components</strong></p>
303
+ <ul class="simple">
304
+ <li><dl class="simple">
305
+ <dt><cite>EEGPT.target_encoder</cite> <strong>(Universal Encoder)</strong></dt>
306
+ <dd><ul>
307
+ <li><p><em>Operations.</em> A hierarchical backbone that consists of <strong>Local Spatio-Temporal Embedding</strong> followed
308
+ by a standard Transformer encoder <a class="citation-reference" href="#eegpt" id="citation-reference-3" role="doc-biblioref">[eegpt]</a>.</p></li>
309
+ <li><p><em>Role.</em> Maps raw spatio-temporal EEG patches into a sequence of latent tokens <math xmlns="http://www.w3.org/1998/Math/MathML">
310
+ <mi>z</mi>
311
+ </math>.</p></li>
312
+ </ul>
313
+ </dd>
314
+ </dl>
315
+ </li>
316
+ <li><dl class="simple">
317
+ <dt><cite>EEGPT.chans_id</cite> <strong>(Channel Identification)</strong></dt>
318
+ <dd><ul>
319
+ <li><p><em>Operations.</em> A buffer containing channel indices mapped from the standard channel names provided
320
+ in <span class="docutils literal">chs_info</span> <a class="citation-reference" href="#eegpt" id="citation-reference-4" role="doc-biblioref">[eegpt]</a>.</p></li>
321
+ <li><p><em>Role.</em> Provides the spatial identity for each input channel, allowing the model to look up
322
+ the correct channel embedding vector <math xmlns="http://www.w3.org/1998/Math/MathML">
323
+ <msub>
324
+ <mi>ς</mi>
325
+ <mi>i</mi>
326
+ </msub>
327
+ </math>.</p></li>
328
+ </ul>
329
+ </dd>
330
+ </dl>
331
+ </li>
332
+ <li><dl class="simple">
333
+ <dt><strong>Local Spatio-Temporal Embedding</strong> (Input Processing)</dt>
334
+ <dd><ul>
335
+ <li><p><em>Operations.</em> The input signal <math xmlns="http://www.w3.org/1998/Math/MathML">
336
+ <mi>X</mi>
337
+ </math> is chunked into patches <math xmlns="http://www.w3.org/1998/Math/MathML">
338
+ <msub>
339
+ <mi>p</mi>
340
+ <mrow>
341
+ <mi>i</mi>
342
+ <mo>,</mo>
343
+ <mi>j</mi>
344
+ </mrow>
345
+ </msub>
346
+ </math>. Each patch
347
+ is linearly projected and summed with a specific channel embedding:
348
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
349
+ <mi>t</mi>
350
+ <mi>o</mi>
351
+ <mi>k</mi>
352
+ <mi>e</mi>
353
+ <msub>
354
+ <mi>n</mi>
355
+ <mrow>
356
+ <mi>i</mi>
357
+ <mo>,</mo>
358
+ <mi>j</mi>
359
+ </mrow>
360
+ </msub>
361
+ <mo>=</mo>
362
+ <mtext>Embed</mtext>
363
+ <mo stretchy="false">(</mo>
364
+ <msub>
365
+ <mi>p</mi>
366
+ <mrow>
367
+ <mi>i</mi>
368
+ <mo>,</mo>
369
+ <mi>j</mi>
370
+ </mrow>
371
+ </msub>
372
+ <mo stretchy="false">)</mo>
373
+ <mo>+</mo>
374
+ <msub>
375
+ <mi>ς</mi>
376
+ <mi>i</mi>
377
+ </msub>
378
+ </math> <a class="citation-reference" href="#eegpt" id="citation-reference-5" role="doc-biblioref">[eegpt]</a>.</p></li>
379
+ <li><p><em>Role.</em> Converts the 2D EEG grid (Channels <math xmlns="http://www.w3.org/1998/Math/MathML">
380
+ <mo>×</mo>
381
+ </math> Time) into a unified sequence of tokens
382
+ that preserves both channel identity and temporal order.</p></li>
383
+ </ul>
384
+ </dd>
385
+ </dl>
386
+ </li>
387
+ </ul>
388
+ <p><strong>How the information is encoded temporally, spatially, and spectrally</strong></p>
389
+ <ul class="simple">
390
+ <li><p><strong>Temporal.</strong>
391
+ The model segments continuous EEG signals into small, non-overlapping patches (e.g., 250ms windows
392
+ with <span class="docutils literal">patch_size=64</span>) <a class="citation-reference" href="#eegpt" id="citation-reference-6" role="doc-biblioref">[eegpt]</a>. This <strong>Patching</strong> mechanism captures short-term local temporal
393
+ structure, while the subsequent Transformer encoder captures long-range temporal dependencies across
394
+ the entire window.</p></li>
395
+ <li><p><strong>Spatial.</strong>
396
+ Unlike convolutional models that may rely on fixed spatial order, EEGPT uses <strong>Channel Embeddings</strong>
397
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
398
+ <msub>
399
+ <mi>ς</mi>
400
+ <mi>i</mi>
401
+ </msub>
402
+ </math> <a class="citation-reference" href="#eegpt" id="citation-reference-7" role="doc-biblioref">[eegpt]</a>. Each channel's data is treated as a distinct sequence of tokens tagged
403
+ with its spatial identity. This allows the model to flexibly handle different montages and
404
+ missing channels by simply mapping channel names to their corresponding learnable embeddings.</p></li>
405
+ <li><p><strong>Spectral.</strong>
406
+ Spectral information is implicitly learned through the <strong>Mask-based Reconstruction</strong> objective
407
+ (<math xmlns="http://www.w3.org/1998/Math/MathML">
408
+ <msub>
409
+ <mi>ℒ</mi>
410
+ <mi>R</mi>
411
+ </msub>
412
+ </math>) <a class="citation-reference" href="#eegpt" id="citation-reference-8" role="doc-biblioref">[eegpt]</a>. By forcing the model to reconstruct raw waveforms (including phase
413
+ and amplitude) from masked inputs, the model learns to encode frequency-specific patterns necessary
414
+ refines this by encouraging these spectral features to align with robust, high-level semantic representations.</p></li>
415
+ </ul>
416
+ <p><strong>Pretrained Weights</strong></p>
417
+ <p>Weights are available on <a class="reference external" href="https://huggingface.co/braindecode/eegpt-pretrained">HuggingFace</a>.</p>
418
+ <aside class="admonition important">
419
+ <p class="admonition-title">Important</p>
420
+ <p><strong>Pre-trained Weights Available</strong></p>
421
+ <p>This model has pre-trained weights available on the Hugging Face Hub.
422
+ <a class="reference external" href="https://huggingface.co/braindecode/eegpt-pretrained">Link here</a>.</p>
423
+ <p>You can load them using:</p>
424
+ <p>To push your own trained model to the Hub:</p>
425
+ <p>Requires installing <span class="docutils literal">braindecode[hug]</span> for Hub integration.</p>
426
+ </aside>
427
+ <p><strong>Usage</strong></p>
428
+ <p>The model can be initialized for specific downstream tasks (e.g., classification) by specifying
429
+ <cite>n_outputs</cite>, <cite>chs_info</cite>, <cite>n_times</cite>.</p>
430
+ <section id="parameters">
431
+ <h2>Parameters</h2>
432
+ <dl class="simple">
433
+ <dt>return_encoder_output<span class="classifier">bool, default=False</span></dt>
434
+ <dd><p>Whether to return the encoder output or the classifier output.</p>
435
+ </dd>
436
+ <dt>patch_size<span class="classifier">int, default=64</span></dt>
437
+ <dd><p>Size of the patches for the transformer.</p>
438
+ </dd>
439
+ <dt>patch_stride<span class="classifier">int, default=32</span></dt>
440
+ <dd><p>Stride of the patches for the transformer.</p>
441
+ </dd>
442
+ <dt>embed_num<span class="classifier">int, default=4</span></dt>
443
+ <dd><p>Number of summary tokens used for the global representation.</p>
444
+ </dd>
445
+ <dt>embed_dim<span class="classifier">int, default=512</span></dt>
446
+ <dd><p>Dimension of the embeddings.</p>
447
+ </dd>
448
+ <dt>depth<span class="classifier">int, default=8</span></dt>
449
+ <dd><p>Number of transformer layers.</p>
450
+ </dd>
451
+ <dt>num_heads<span class="classifier">int, default=8</span></dt>
452
+ <dd><p>Number of attention heads.</p>
453
+ </dd>
454
+ <dt>mlp_ratio<span class="classifier">float, default=4.0</span></dt>
455
+ <dd><p>Ratio of the MLP hidden dimension to the embedding dimension.</p>
456
+ </dd>
457
+ <dt>drop_prob<span class="classifier">float, default=0.0</span></dt>
458
+ <dd><p>Dropout probability.</p>
459
+ </dd>
460
+ <dt>attn_drop_rate<span class="classifier">float, default=0.0</span></dt>
461
+ <dd><p>Attention dropout rate.</p>
462
+ </dd>
463
+ <dt>drop_path_rate<span class="classifier">float, default=0.0</span></dt>
464
+ <dd><p>Drop path rate.</p>
465
+ </dd>
466
+ <dt>init_std<span class="classifier">float, default=0.02</span></dt>
467
+ <dd><p>Standard deviation for weight initialization.</p>
468
+ </dd>
469
+ <dt>qkv_bias<span class="classifier">bool, default=True</span></dt>
470
+ <dd><p>Whether to use bias in the QKV projection.</p>
471
+ </dd>
472
+ <dt>norm_layer<span class="classifier">torch.nn.Module, default=None</span></dt>
473
+ <dd><p>Normalization layer. If None, defaults to <span class="docutils literal">nn.LayerNorm</span> with epsilon <span class="docutils literal">layer_norm_eps</span>.</p>
474
+ </dd>
475
+ <dt>layer_norm_eps<span class="classifier">float, default=1e-6</span></dt>
476
+ <dd><p>Epsilon value for the normalization layer.</p>
477
+ </dd>
478
+ </dl>
479
+ </section>
480
+ <section id="references">
481
+ <h2>References</h2>
482
+ <div role="list" class="citation-list">
483
+ <div class="citation" id="eegpt" role="doc-biblioentry">
484
+ <span class="label"><span class="fn-bracket">[</span>eegpt<span class="fn-bracket">]</span></span>
485
+ <span class="backrefs">(<a role="doc-backlink" href="#citation-reference-1">1</a>,<a role="doc-backlink" href="#citation-reference-2">2</a>,<a role="doc-backlink" href="#citation-reference-3">3</a>,<a role="doc-backlink" href="#citation-reference-4">4</a>,<a role="doc-backlink" href="#citation-reference-5">5</a>,<a role="doc-backlink" href="#citation-reference-6">6</a>,<a role="doc-backlink" href="#citation-reference-7">7</a>,<a role="doc-backlink" href="#citation-reference-8">8</a>)</span>
486
+ <p>Wang, G., Liu, W., He, Y., Xu, C., Ma, L., &amp; Li, H. (2024).
487
+ EEGPT: Pretrained transformer for universal and reliable representation of eeg signals.
488
+ Advances in Neural Information Processing Systems, 37, 39249-39280.
489
+ Online: <a class="reference external" href="https://proceedings.neurips.cc/paper_files/paper/2024/file/4540d267eeec4e5dbd9dae9448f0b739-Paper-Conference.pdf">https://proceedings.neurips.cc/paper_files/paper/2024/file/4540d267eeec4e5dbd9dae9448f0b739-Paper-Conference.pdf</a></p>
490
+ </div>
491
+ </div>
492
+ </section>
493
+ <section id="notes">
494
+ <h2>Notes</h2>
495
+ <p>When loading pretrained weights from the original EEGPT checkpoint (e.g., for
496
+ fine-tuning), you may encounter &quot;unexpected keys&quot; related to the <cite>predictor</cite>
497
+ and <cite>reconstructor</cite> modules (e.g., <cite>predictor.mask_token</cite>, <cite>reconstructor.time_embed</cite>).
498
+ These components are used only during the self-supervised pre-training phase
499
+ (Masked Auto-Encoder) and are not part of this encoder-only model used for
500
+ downstream tasks. It is safe to ignore them.</p>
501
+ <p><strong>Hugging Face Hub integration</strong></p>
502
+ <p>When the optional <span class="docutils literal">huggingface_hub</span> package is installed, all models
503
+ automatically gain the ability to be pushed to and loaded from the
504
+ Hugging Face Hub. Install with:</p>
505
+ <pre class="literal-block">pip install braindecode[hub]</pre>
506
+ <p><strong>Pushing a model to the Hub:</strong></p>
507
+ <p><strong>Loading a model from the Hub:</strong></p>
508
+ <p><strong>Extracting features and replacing the head:</strong></p>
509
+ <p><strong>Saving and restoring full configuration:</strong></p>
510
+ <p>All model parameters (both EEG-specific and model-specific such as
511
+ dropout rates, activation functions, number of filters) are automatically
512
+ saved to the Hub and restored when loading.</p>
513
+ <p>See :ref:`load-pretrained-models` for a complete tutorial.</p>
514
+ </section>
515
+ </main>
516
+ </div>
517
+
518
+ ## Citation
519
+
520
+ Please cite both the original paper for this architecture (see the
521
+ *References* section above) and braindecode:
522
+
523
+ ```bibtex
524
+ @article{aristimunha2025braindecode,
525
+ title = {Braindecode: a deep learning library for raw electrophysiological data},
526
+ author = {Aristimunha, Bruno and others},
527
+ journal = {Zenodo},
528
+ year = {2025},
529
+ doi = {10.5281/zenodo.17699192},
530
+ }
531
+ ```
532
+
533
+ ## License
534
+
535
+ BSD-3-Clause for the model code (matching braindecode).
536
+ Pretraining-derived weights, if you fine-tune from a checkpoint,
537
+ inherit the licence of that checkpoint and its training corpus.