AI-Sweden-Models
/

ModernBERT-large

Model card Files Files and versions

timpal0l commited on Jul 18

Commit

a3566cd

·

verified ·

1 Parent(s): 68322d3

Update README.md

Files changed (1) hide show

README.md +23 -0

README.md CHANGED Viewed

@@ -29,6 +29,29 @@ This checkpoint continues the pre-training of `answerdotai/ModernBERT-large` on
 | Optimizer & schedule | Decoupled StableAdamW, lr 2 e-4, cosine decay (1 % warm-up) |
 | Precision | AMP-bf16 |
 | Hardware | 8 nodes × 8 AMD MI250X GPUs (64 GPUs) on the EuroHPC **LUMI-G** system |
 ## Intended Use
 * Fill-mask inference, embedding extraction and fine-tuning for Scandinavian downstream NLP tasks (classification, NER, QA, etc.).
 * Drop-in replacement for BERT-style encoders (omit `token_type_ids`).

 | Optimizer & schedule | Decoupled StableAdamW, lr 2 e-4, cosine decay (1 % warm-up) |
 | Precision | AMP-bf16 |
 | Hardware | 8 nodes × 8 AMD MI250X GPUs (64 GPUs) on the EuroHPC **LUMI-G** system |
+## Training log
+```python
+[token=982585522155/1198510347252]:
+     Train time/batch: 716208
+     Train time/sample: 137511936
+     Train time/batch_in_epoch: 716208
+     Train time/sample_in_epoch: 137511936
+     Train time/token: 982584117341
+     Train time/token_in_epoch: 982584117341
+     Train trainer/device_train_microbatch_size: 3
+     Train loss/train/total: 0.8162
+     Train throughput/batches_per_sec: 0.6466
+     Train throughput/samples_per_sec: 124.1393
+     Train throughput/device/batches_per_sec: 0.0101
+     Train throughput/device/samples_per_sec: 1.9397
+     Train throughput/tokens_per_sec: 887795.9110
+     Train throughput/device/tokens_per_sec: 13871.8111
+     Train time/train: 317.5722
+     Train time/val: 0.0000
+     Train time/total: 317.5722
+     Train lr-StableAdamW/group0: 0.0000
+     Train lr-StableAdamW/group1: 0.0000
+```
 ## Intended Use
 * Fill-mask inference, embedding extraction and fine-tuning for Scandinavian downstream NLP tasks (classification, NER, QA, etc.).
 * Drop-in replacement for BERT-style encoders (omit `token_type_ids`).