Update README.md
Browse files
README.md
CHANGED
@@ -29,6 +29,29 @@ This checkpoint continues the pre-training of `answerdotai/ModernBERT-large` on
|
|
29 |
| Optimizer & schedule | Decoupled StableAdamW, lr 2 e-4, cosine decay (1 % warm-up) |
|
30 |
| Precision | AMP-bf16 |
|
31 |
| Hardware | 8 nodes × 8 AMD MI250X GPUs (64 GPUs) on the EuroHPC **LUMI-G** system |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
## Intended Use
|
33 |
* Fill-mask inference, embedding extraction and fine-tuning for Scandinavian downstream NLP tasks (classification, NER, QA, etc.).
|
34 |
* Drop-in replacement for BERT-style encoders (omit `token_type_ids`).
|
|
|
29 |
| Optimizer & schedule | Decoupled StableAdamW, lr 2 e-4, cosine decay (1 % warm-up) |
|
30 |
| Precision | AMP-bf16 |
|
31 |
| Hardware | 8 nodes × 8 AMD MI250X GPUs (64 GPUs) on the EuroHPC **LUMI-G** system |
|
32 |
+
## Training log
|
33 |
+
```python
|
34 |
+
[token=982585522155/1198510347252]:
|
35 |
+
Train time/batch: 716208
|
36 |
+
Train time/sample: 137511936
|
37 |
+
Train time/batch_in_epoch: 716208
|
38 |
+
Train time/sample_in_epoch: 137511936
|
39 |
+
Train time/token: 982584117341
|
40 |
+
Train time/token_in_epoch: 982584117341
|
41 |
+
Train trainer/device_train_microbatch_size: 3
|
42 |
+
Train loss/train/total: 0.8162
|
43 |
+
Train throughput/batches_per_sec: 0.6466
|
44 |
+
Train throughput/samples_per_sec: 124.1393
|
45 |
+
Train throughput/device/batches_per_sec: 0.0101
|
46 |
+
Train throughput/device/samples_per_sec: 1.9397
|
47 |
+
Train throughput/tokens_per_sec: 887795.9110
|
48 |
+
Train throughput/device/tokens_per_sec: 13871.8111
|
49 |
+
Train time/train: 317.5722
|
50 |
+
Train time/val: 0.0000
|
51 |
+
Train time/total: 317.5722
|
52 |
+
Train lr-StableAdamW/group0: 0.0000
|
53 |
+
Train lr-StableAdamW/group1: 0.0000
|
54 |
+
```
|
55 |
## Intended Use
|
56 |
* Fill-mask inference, embedding extraction and fine-tuning for Scandinavian downstream NLP tasks (classification, NER, QA, etc.).
|
57 |
* Drop-in replacement for BERT-style encoders (omit `token_type_ids`).
|