Fill-Mask
Transformers
Safetensors
modernbert
masked-lm
long-context
timpal0l commited on
Commit
a3566cd
·
verified ·
1 Parent(s): 68322d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -29,6 +29,29 @@ This checkpoint continues the pre-training of `answerdotai/ModernBERT-large` on
29
  | Optimizer & schedule | Decoupled StableAdamW, lr 2 e-4, cosine decay (1 % warm-up) |
30
  | Precision | AMP-bf16 |
31
  | Hardware | 8 nodes × 8 AMD MI250X GPUs (64 GPUs) on the EuroHPC **LUMI-G** system |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ## Intended Use
33
  * Fill-mask inference, embedding extraction and fine-tuning for Scandinavian downstream NLP tasks (classification, NER, QA, etc.).
34
  * Drop-in replacement for BERT-style encoders (omit `token_type_ids`).
 
29
  | Optimizer & schedule | Decoupled StableAdamW, lr 2 e-4, cosine decay (1 % warm-up) |
30
  | Precision | AMP-bf16 |
31
  | Hardware | 8 nodes × 8 AMD MI250X GPUs (64 GPUs) on the EuroHPC **LUMI-G** system |
32
+ ## Training log
33
+ ```python
34
+ [token=982585522155/1198510347252]:
35
+ Train time/batch: 716208
36
+ Train time/sample: 137511936
37
+ Train time/batch_in_epoch: 716208
38
+ Train time/sample_in_epoch: 137511936
39
+ Train time/token: 982584117341
40
+ Train time/token_in_epoch: 982584117341
41
+ Train trainer/device_train_microbatch_size: 3
42
+ Train loss/train/total: 0.8162
43
+ Train throughput/batches_per_sec: 0.6466
44
+ Train throughput/samples_per_sec: 124.1393
45
+ Train throughput/device/batches_per_sec: 0.0101
46
+ Train throughput/device/samples_per_sec: 1.9397
47
+ Train throughput/tokens_per_sec: 887795.9110
48
+ Train throughput/device/tokens_per_sec: 13871.8111
49
+ Train time/train: 317.5722
50
+ Train time/val: 0.0000
51
+ Train time/total: 317.5722
52
+ Train lr-StableAdamW/group0: 0.0000
53
+ Train lr-StableAdamW/group1: 0.0000
54
+ ```
55
  ## Intended Use
56
  * Fill-mask inference, embedding extraction and fine-tuning for Scandinavian downstream NLP tasks (classification, NER, QA, etc.).
57
  * Drop-in replacement for BERT-style encoders (omit `token_type_ids`).