bruAristimunha commited on
Commit
808fd41
·
verified ·
1 Parent(s): 88f567d

Add architecture-only model card

Browse files
Files changed (1) hide show
  1. README.md +336 -0
README.md ADDED
@@ -0,0 +1,336 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: bsd-3-clause
3
+ library_name: braindecode
4
+ pipeline_tag: feature-extraction
5
+ tags:
6
+ - eeg
7
+ - biosignal
8
+ - pytorch
9
+ - neuroscience
10
+ - braindecode
11
+ - foundation-model
12
+ - convolutional
13
+ - transformer
14
+ ---
15
+
16
+ # InterpolatedBENDR
17
+
18
+ Channel-interpolating wrapper around :class:`BENDR`.
19
+
20
+ > **Architecture-only repository.** This repo documents the
21
+ > `braindecode.models.InterpolatedBENDR` class. **No pretrained weights are
22
+ > distributed here** — instantiate the model and train it on your own
23
+ > data, or fine-tune from a published foundation-model checkpoint
24
+ > separately.
25
+
26
+ ## Quick start
27
+
28
+ ```bash
29
+ pip install braindecode
30
+ ```
31
+
32
+ ```python
33
+ from braindecode.models import InterpolatedBENDR
34
+
35
+ model = InterpolatedBENDR(
36
+ n_chans=20,
37
+ sfreq=256,
38
+ input_window_seconds=4.0,
39
+ n_outputs=2,
40
+ )
41
+ ```
42
+
43
+ The signal-shape arguments above are example defaults — adjust them
44
+ to match your recording.
45
+
46
+ ## Documentation
47
+
48
+ - Full API reference (parameters, references, architecture figure):
49
+ <https://braindecode.org/stable/generated/braindecode.models.InterpolatedBENDR.html>
50
+ - Interactive browser with live instantiation:
51
+ <https://huggingface.co/spaces/braindecode/model-explorer>
52
+ - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/interpolated.py#L1>
53
+
54
+ ## Architecture description
55
+
56
+ The block below is the rendered class docstring (parameters,
57
+ references, architecture figure where available).
58
+
59
+ <div class='bd-doc'><main>
60
+ <p>Channel-interpolating wrapper around :class:`BENDR`.</p>
61
+ <p>:bdg-dark-line:`Channel`</p>
62
+ <p>Accepts arbitrary user <span class="docutils literal">chs_info</span> and projects them to the
63
+ backbone's canonical channel set via
64
+ :class:`~braindecode.modules.ChannelInterpolationLayer`.</p>
65
+ <p>For all other parameters and behavior see the backbone
66
+ documentation reproduced below.</p>
67
+ <p>BENDR (BErt-inspired Neural Data Representations) from Kostas et al (2021) [bendr]_.</p>
68
+ <span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#d9534f;color:white;font-size:11px;font-weight:600;margin-right:4px;">Foundation Model</span>
69
+
70
+
71
+
72
+ .. figure:: https://www.frontiersin.org/files/Articles/653659/fnhum-15-653659-HTML/image_m/fnhum-15-653659-g001.jpg
73
+ :align: center
74
+ :alt: BENDR Architecture
75
+ :width: 1000px
76
+
77
+ The **BENDR** architecture adapts techniques used for language modeling (LM) toward the
78
+ development of encephalography modeling (EM) [bendr]_. It utilizes a self-supervised
79
+ training objective to learn compressed representations of raw EEG signals [bendr]_. The
80
+ model is capable of modeling completely novel raw EEG sequences recorded with differing
81
+ hardware and subjects, aiming for transferable performance across a variety of downstream
82
+ BCI and EEG classification tasks [bendr]_.
83
+
84
+ .. rubric:: Architectural Overview
85
+
86
+ BENDR is adapted from wav2vec 2.0 [wav2vec2]_ and is composed of two main stages: a
87
+ feature extractor (Convolutional stage) that produces BErt-inspired Neural Data
88
+ Representations (BENDR), followed by a transformer encoder (Contextualizer) [bendr]_.
89
+
90
+ .. rubric:: Macro Components
91
+
92
+ - `BENDR.encoder` **(Convolutional Stage/Feature Extractor)**
93
+ - *Operations.* A stack of six short-receptive field 1D convolutions [bendr]_. Each
94
+ block consists of 1D convolution, GroupNorm, and GELU activation.
95
+ - *Role.* Takes raw data :math:`X_{raw}` and dramatically downsamples it to a new
96
+ sequence of vectors (BENDR) [bendr]_. Each resulting vector has a length of 512.
97
+ - `BENDR.contextualizer` **(Transformer Encoder)**
98
+ - *Operations.* A transformer encoder that uses layered, multi-head self-attention
99
+ [bendr]_. It employs T-Fixup weight initialization [tfixup]_ and uses 8 layers
100
+ and 8 heads.
101
+ - *Role.* Maps the sequence of BENDR vectors to a contextualized sequence. The output
102
+ of a fixed start token is typically used as the aggregate representation for
103
+ downstream classification [bendr]_.
104
+ - `Contextualizer.position_encoder` **(Positional Encoding)**
105
+ - *Operations.* An additive (grouped) convolution layer with a receptive field of 25
106
+ and 16 groups [bendr]_.
107
+ - *Role.* Encodes position information before the input enters the transformer.
108
+
109
+ .. rubric:: How the information is encoded temporally, spatially, and spectrally
110
+
111
+ * **Temporal.**
112
+ The convolutional encoder uses a stack of blocks where the stride matches the receptive
113
+ field (e.g., 3 for the first block, 2 for subsequent blocks) [bendr]_. This process
114
+ downsamples the raw data by a factor of 96, resulting in an effective sampling frequency
115
+ of approximately 2.67 Hz.
116
+ * **Spatial.**
117
+ To maintain simplicity and reduce complexity, the convolutional stage uses **1D
118
+ convolutions** and elects not to mix EEG channels across the first stage [bendr]_. The
119
+ input includes 20 channels (19 EEG channels and one relative amplitude channel).
120
+ * **Spectral.**
121
+ The convolution operations implicitly extract features from the raw EEG signal [bendr]_.
122
+ The representations (BENDR) are derived from the raw waveform using convolutional
123
+ operations followed by sequence modeling [wav2vec2]_.
124
+
125
+ .. rubric:: Additional Mechanisms
126
+
127
+ - **Self-Supervision (Pre-training).** Uses a masked sequence learning approach (adapted
128
+ from wav2vec 2.0 [wav2vec2]_) where contiguous spans of BENDR sequences are masked, and
129
+ the model attempts to reconstruct the original underlying encoded vector based on the
130
+ transformer output and a set of negative distractors [bendr]_.
131
+ - **Regularization.** LayerDrop [layerdrop]_ and Dropout (at probabilities 0.01 and 0.15,
132
+ respectively) are used during pre-training [bendr]_. The implementation also uses T-Fixup
133
+ scaling for parameter initialization [tfixup]_.
134
+ - **Input Conditioning.** A fixed token (a vector filled with the value **-5**) is
135
+ prepended to the BENDR sequence before input to the transformer, serving as the aggregate
136
+ representation token [bendr]_.
137
+
138
+ .. important::
139
+ **Pre-trained Weights Available**
140
+
141
+ This model has pre-trained weights available on the Hugging Face Hub.
142
+ You can load them using:
143
+
144
+ .. code:: python
145
+ from braindecode.models import BENDR
146
+
147
+ # Load pre-trained model from Hugging Face Hub
148
+ # you can specify `n_outputs` for your downstream task
149
+ model = BENDR.from_pretrained("braindecode/braindecode-bendr", n_outputs=2)
150
+
151
+ To push your own trained model to the Hub:
152
+
153
+ .. code:: python
154
+ # After training your model
155
+ model.push_to_hub(
156
+ repo_id="username/my-bendr-model", commit_message="Upload trained BENDR model"
157
+ )
158
+
159
+ Requires installing ``braindecode[hug]`` for Hub integration.
160
+
161
+ Notes
162
+ -----
163
+ * The full BENDR architecture contains a large number of parameters; configuration (1)
164
+ involved training over **one billion parameters** [bendr]_.
165
+ * Randomly initialized full BENDR architecture was generally ineffective at solving
166
+ downstream tasks without prior self-supervised training [bendr]_.
167
+ * The pre-training task (contrastive predictive coding via masking) is generalizable,
168
+ exhibiting strong uniformity of performance across novel subjects, hardware, and
169
+ tasks [bendr]_.
170
+
171
+ .. warning::
172
+
173
+ **Important:** To utilize the full potential of BENDR, the model requires
174
+ **self-supervised pre-training** on large, unlabeled EEG datasets (like TUEG) followed
175
+ by subsequent fine-tuning on the specific downstream classification task [bendr]_.
176
+
177
+ References
178
+ ----------
179
+ .. [bendr] Kostas, D., Aroca-Ouellette, S., & Rudzicz, F. (2021).
180
+ BENDR: Using transformers and a contrastive self-supervised learning task to learn from
181
+ massive amounts of EEG data.
182
+ Frontiers in Human Neuroscience, 15, 653659.
183
+ https://doi.org/10.3389/fnhum.2021.653659
184
+ .. [wav2vec2] Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020).
185
+ wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.
186
+ In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds),
187
+ Advances in Neural Information Processing Systems (Vol. 33, pp. 12449-12460).
188
+ https://dl.acm.org/doi/10.5555/3495724.3496768
189
+ .. [tfixup] Huang, T. K., Liang, S., Jha, A., & Salakhutdinov, R. (2020).
190
+ Improving Transformer Optimization Through Better Initialization.
191
+ In International Conference on Machine Learning (pp. 4475-4483). PMLR.
192
+ https://dl.acm.org/doi/10.5555/3524938.3525354
193
+ .. [layerdrop] Fan, A., Grave, E., & Joulin, A. (2020).
194
+ Reducing Transformer Depth on Demand with Structured Dropout.
195
+ International Conference on Learning Representations.
196
+ Retrieved from https://openreview.net/forum?id=SylO2yStDr
197
+
198
+ Parameters
199
+ ----------
200
+ encoder_h : int, default=512
201
+ Hidden size (number of output channels) of the convolutional encoder. This determines
202
+ the dimensionality of the BENDR feature vectors produced by the encoder.
203
+ contextualizer_hidden : int, default=3076
204
+ Hidden size of the feedforward layer within each transformer block. The paper uses
205
+ approximately 2x the transformer dimension (3076 ~ 2 x 1536).
206
+ projection_head : bool, default=False
207
+ If True, adds a projection layer at the end of the encoder to project back to the
208
+ input feature size. This is used during self-supervised pre-training but typically
209
+ disabled during fine-tuning.
210
+ drop_prob : float, default=0.1
211
+ Dropout probability applied throughout the model. The paper recommends 0.15 for
212
+ pre-training and 0.0 for fine-tuning. Default is 0.1 as a compromise.
213
+ layer_drop : float, default=0.0
214
+ Probability of dropping entire transformer layers during training (LayerDrop
215
+ regularization [layerdrop]_). The paper uses 0.01 for pre-training and 0.0 for
216
+ fine-tuning.
217
+ activation : :class:`torch.nn.Module`, default=:class:`torch.nn.GELU`
218
+ Activation function used in the encoder convolutional blocks. The paper uses GELU
219
+ activation throughout.
220
+ transformer_layers : int, default=8
221
+ Number of transformer encoder layers in the contextualizer. The paper uses 8 layers.
222
+ transformer_heads : int, default=8
223
+ Number of attention heads in each transformer layer. The paper uses 8 heads with
224
+ head dimension of 192 (1536 / 8).
225
+ position_encoder_length : int, default=25
226
+ Kernel size for the convolutional positional encoding layer. The paper uses a
227
+ receptive field of 25 with 16 groups.
228
+ enc_width : tuple of int, default=(3, 2, 2, 2, 2, 2)
229
+ Kernel sizes for each of the 6 convolutional blocks in the encoder. Each value
230
+ corresponds to one block.
231
+ enc_downsample : tuple of int, default=(3, 2, 2, 2, 2, 2)
232
+ Stride values for each of the 6 convolutional blocks in the encoder. The total
233
+ downsampling factor is the product of all strides (3 x 2 x 2 x 2 x 2 x 2 = 96).
234
+ start_token : int or float, default=-5
235
+ Value used to fill the start token embedding that is prepended to the BENDR sequence
236
+ before input to the transformer. This token's output is used as the aggregate
237
+ representation for classification.
238
+ final_layer : bool, default=True
239
+ If True, includes a final linear classification layer that maps from encoder_h to
240
+ n_outputs. If False, the model outputs the contextualized features directly.
241
+ encoder_only : bool, default=False
242
+ If True, bypass the contextualizer and use 4-chunk temporal pooling on the encoder
243
+ output instead. This corresponds to the encoder-only configuration described in
244
+ Section 2.4 and Table 2 of Kostas et al. (2021) [bendr]_, which outperformed the
245
+ full model on 4 out of 5 downstream tasks. The encoder output is split into 4 equal
246
+ temporal chunks, each chunk is mean-pooled, and the results are concatenated to
247
+ produce a feature vector of size ``encoder_h * 4`` (2048-dim with default settings).
248
+ The contextualizer is still created (to allow loading pretrained weights) but is not
249
+ used in the forward pass. Requires input length of at least
250
+ ``4 * product(enc_downsample)`` samples (384 with default downsampling of 96x).
251
+
252
+ .. rubric:: Hugging Face Hub integration
253
+
254
+ When the optional ``huggingface_hub`` package is installed, all models
255
+ automatically gain the ability to be pushed to and loaded from the
256
+ Hugging Face Hub. Install with::
257
+
258
+ pip install braindecode[hub]
259
+
260
+ **Pushing a model to the Hub:**
261
+
262
+ .. code::
263
+ from braindecode.models import BENDR
264
+
265
+ # Train your model
266
+ model = BENDR(n_chans=22, n_outputs=4, n_times=1000)
267
+ # ... training code ...
268
+
269
+ # Push to the Hub
270
+ model.push_to_hub(
271
+ repo_id="username/my-bendr-model",
272
+ commit_message="Initial model upload",
273
+ )
274
+
275
+ **Loading a model from the Hub:**
276
+
277
+ .. code::
278
+ from braindecode.models import BENDR
279
+
280
+ # Load pretrained model
281
+ model = BENDR.from_pretrained("username/my-bendr-model")
282
+
283
+ # Load with a different number of outputs (head is rebuilt automatically)
284
+ model = BENDR.from_pretrained("username/my-bendr-model", n_outputs=4)
285
+
286
+ **Extracting features and replacing the head:**
287
+
288
+ .. code::
289
+ import torch
290
+
291
+ x = torch.randn(1, model.n_chans, model.n_times)
292
+ # Extract encoder features (consistent dict across all models)
293
+ out = model(x, return_features=True)
294
+ features = out["features"]
295
+
296
+ # Replace the classification head
297
+ model.reset_head(n_outputs=10)
298
+
299
+ **Saving and restoring full configuration:**
300
+
301
+ .. code::
302
+ import json
303
+
304
+ config = model.get_config() # all __init__ params
305
+ with open("config.json", "w") as f:
306
+ json.dump(config, f)
307
+
308
+ model2 = BENDR.from_config(config) # reconstruct (no weights)
309
+
310
+ All model parameters (both EEG-specific and model-specific such as
311
+ dropout rates, activation functions, number of filters) are automatically
312
+ saved to the Hub and restored when loading.
313
+
314
+ See :ref:`load-pretrained-models` for a complete tutorial.</main>
315
+ </div>
316
+
317
+ ## Citation
318
+
319
+ Please cite both the original paper for this architecture (see the
320
+ *References* section above) and braindecode:
321
+
322
+ ```bibtex
323
+ @article{aristimunha2025braindecode,
324
+ title = {Braindecode: a deep learning library for raw electrophysiological data},
325
+ author = {Aristimunha, Bruno and others},
326
+ journal = {Zenodo},
327
+ year = {2025},
328
+ doi = {10.5281/zenodo.17699192},
329
+ }
330
+ ```
331
+
332
+ ## License
333
+
334
+ BSD-3-Clause for the model code (matching braindecode).
335
+ Pretraining-derived weights, if you fine-tune from a checkpoint,
336
+ inherit the licence of that checkpoint and its training corpus.