benjamin commited on
Commit
d63f342
1 Parent(s): da6967f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -42
README.md CHANGED
@@ -1,56 +1,43 @@
1
  ---
2
- tags:
3
- - generated_from_trainer
4
- model-index:
5
- - name: roberta_large_ukrainian
6
- results: []
7
  ---
8
 
9
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
- should probably proofread and complete it, then remove this comment. -->
11
 
12
- # roberta_large_ukrainian
13
 
14
- This model is a fine-tuned version of [roberta_large_uk](https://huggingface.co/roberta_large_uk) on the None dataset.
15
 
16
- ## Model description
17
 
18
- More information needed
19
 
20
- ## Intended uses & limitations
 
 
 
 
 
 
 
 
21
 
22
- More information needed
23
 
24
- ## Training and evaluation data
 
 
 
 
 
 
 
 
25
 
26
- More information needed
27
 
28
- ## Training procedure
29
 
30
- ### Training hyperparameters
31
 
32
- The following hyperparameters were used during training:
33
- - learning_rate: 0.0001
34
- - train_batch_size: 4
35
- - eval_batch_size: 4
36
- - seed: 42
37
- - distributed_type: tpu
38
- - num_devices: 8
39
- - gradient_accumulation_steps: 16
40
- - total_train_batch_size: 512
41
- - total_eval_batch_size: 32
42
- - optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-06
43
- - lr_scheduler_type: linear
44
- - lr_scheduler_warmup_steps: 25000
45
- - training_steps: 250000
46
-
47
- ### Training results
48
-
49
-
50
-
51
- ### Framework versions
52
-
53
- - Transformers 4.18.0.dev0
54
- - Pytorch 1.10.0+cu102
55
- - Datasets 1.18.4
56
- - Tokenizers 0.11.6
 
1
  ---
2
+ license: mit
3
+ language: uk
 
 
 
4
  ---
5
 
6
+ # roberta-base-wechsel-ukrainian
 
7
 
8
+ [`roberta-base`](https://huggingface.co/roberta-base) transferred to Ukrainian using the method from the NAACL2022 paper [WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models](https://arxiv.org/abs/2112.065989).
9
 
10
+ # Evaluation
11
 
12
+ Evaluation was done on [lang-uk's ner-uk project](https://github.com/lang-uk/ner-uk), the Ukrainian portion of [WikiANN](https://huggingface.co/datasets/wikiann) and the [Ukrainian IU corpus from the Universal Dependencies project](https://github.com/UniversalDependencies/UD_Ukrainian-IU).
13
 
14
+ __Validation Results__
15
 
16
+ | | lang-uk NER (Micro F1) | WikiANN (Micro F1) | UD Ukrainian IU POS (Accuracy) |
17
+ |:-------------------------------------------------|:-------------------------|:-------------|:-------------------------|
18
+ | roberta-base-wechsel-ukrainian | 88.06 (0.50) | 92.96 (0.08) | 98.70 (0.05) |
19
+ | roberta-large-wechsel-ukrainian | __89.27 (0.53)__ | __93.22 (0.15)__ | __98.86 (0.03)__ |
20
+ | roberta-base-scratch-ukrainian* | 85.49 (0.88) | 91.91 (0.08) | 98.49 (0.04) |
21
+ | roberta-large-scratch-ukrainian* | 86.54 (0.70) | 92.39 (0.16) | 98.65 (0.09) |
22
+ | dbmdz/electra-base-ukrainian-cased-discriminator | 87.49 (0.52) | 93.20 (0.16) | 98.60 (0.03) |
23
+ | xlm-roberta-base | 86.68 (0.44) | 92.41 (0.13) | 98.53 (0.02) |
24
+ | xlm-roberta-large | 86.64 (1.61) | 93.01 (0.13) | 98.71 (0.04) |
25
 
26
+ __Test Results__
27
 
28
+ | | lang-uk NER (Micro F1) | WikiANN (Micro F1) | UD Ukrainian IU POS (Accuracy) |
29
+ |:-------------------------------------------------|:-------------------------|:-------------|:-------------------------|
30
+ | roberta-base-wechsel-ukrainian | 90.81 (1.51) | 92.98 (0.12) | 98.57 (0.03) |
31
+ | roberta-large-wechsel-ukrainian | __91.24 (1.16)__ | __93.22 (0.17)__ | __98.74 (0.06)__ |
32
+ | roberta-base-scratch-ukrainian* | 89.57 (1.01) | 92.05 (0.09) | 98.31 (0.08) |
33
+ | roberta-large-scratch-ukrainian* | 89.96 (0.89) | 92.49 (0.15) | 98.52 (0.04) |
34
+ | dbmdz/electra-base-ukrainian-cased-discriminator | 90.43 (1.29) | 92.99 (0.11) | 98.59 (0.06) |
35
+ | xlm-roberta-base | 90.86 (0.81) | 92.27 (0.09) | 98.45 (0.07) |
36
+ | xlm-roberta-large | 90.16 (2.98) | 92.92 (0.19) | 98.71 (0.04) |
37
 
38
+ \*trained using the same exact training setup as the wechsel-\* models, but without parameter transfer.
39
 
 
40
 
41
+ # License
42
 
43
+ MIT