ajders commited on
Commit
29b302a
1 Parent(s): 017aa76

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -13
README.md CHANGED
@@ -13,23 +13,11 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  # nl_electra
15
 
16
- This model is a fine-tuned version of [](https://huggingface.co/) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
  - Loss: 2.4650
19
  - Accuracy: 0.5392
20
 
21
- ## Model description
22
-
23
- More information needed
24
-
25
- ## Intended uses & limitations
26
-
27
- More information needed
28
-
29
- ## Training and evaluation data
30
-
31
- More information needed
32
-
33
  ## Training procedure
34
 
35
  ### Training hyperparameters
@@ -652,3 +640,43 @@ The following hyperparameters were used during training:
652
  - Pytorch 1.12.0+cu102
653
  - Datasets 2.3.2
654
  - Tokenizers 0.12.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  # nl_electra
15
 
16
+ This model is a pretrained version of [ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra) on the Dutch subset of the [CC100](https://huggingface.co/datasets/cc100) dataset.
17
  It achieves the following results on the evaluation set:
18
  - Loss: 2.4650
19
  - Accuracy: 0.5392
20
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## Training procedure
22
 
23
  ### Training hyperparameters
 
640
  - Pytorch 1.12.0+cu102
641
  - Datasets 2.3.2
642
  - Tokenizers 0.12.1
643
+
644
+ ### Additional configurations
645
+
646
+ ```
647
+ data:
648
+ dataset_name: cc100
649
+ lang: nl
650
+ overwrite_cache: False
651
+ validation_split_percentage: 5
652
+ max_seq_length: 512
653
+ preprocessing_num_workers: 8
654
+ mlm_probability: 0.15
655
+ line_by_line: False
656
+ pad_to_max_length: False
657
+ max_train_samples: -1
658
+ max_eval_samples: -1
659
+
660
+ training:
661
+ do_train: True
662
+ do_eval: True
663
+ do_predict: True
664
+ resume_from_checkpoint: False
665
+ evaluation_strategy: steps
666
+ eval_steps: 500
667
+ per_device_train_batch_size: 16
668
+ per_device_eval_batch_size: 16
669
+ gradient_accumulation_steps: 32
670
+ eval_accumulation_steps: 1
671
+ learning_rate: 5e-5
672
+ weight_decay: 0.0
673
+ adam_beta1: 0.9
674
+ adam_beta2: 0.999
675
+ adam_epsilon: 1e-8
676
+ max_grad_norm: 1.0
677
+ num_train_epochs: 400.0
678
+ lr_scheduler_type: linear
679
+ fp16: False
680
+ warmup_steps: 8000
681
+ seed: 703
682
+ ```