benjamin
/

roberta-large-wechsel-ukrainian

@@ -1,56 +1,43 @@
 ---
-tags:
-- generated_from_trainer
-model-index:
-- name: roberta_large_ukrainian
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# roberta_large_ukrainian
-This model is a fine-tuned version of [roberta_large_uk](https://huggingface.co/roberta_large_uk) on the None dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0001
-- train_batch_size: 4
-- eval_batch_size: 4
-- seed: 42
-- distributed_type: tpu
-- num_devices: 8
-- gradient_accumulation_steps: 16
-- total_train_batch_size: 512
-- total_eval_batch_size: 32
-- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-06
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 25000
-- training_steps: 250000
-### Training results
-### Framework versions
-- Transformers 4.18.0.dev0
-- Pytorch 1.10.0+cu102
-- Datasets 1.18.4
-- Tokenizers 0.11.6

 ---
+license: mit
+language: uk
 ---
+# roberta-base-wechsel-ukrainian
+[`roberta-base`](https://huggingface.co/roberta-base) transferred to Ukrainian using the method from the NAACL2022 paper [WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models](https://arxiv.org/abs/2112.065989).
+# Evaluation
+Evaluation was done on [lang-uk's ner-uk project](https://github.com/lang-uk/ner-uk), the Ukrainian portion of [WikiANN](https://huggingface.co/datasets/wikiann) and the [Ukrainian IU corpus from the Universal Dependencies project](https://github.com/UniversalDependencies/UD_Ukrainian-IU).
+__Validation Results__
+|                                                  | lang-uk NER (Micro F1)   | WikiANN (Micro F1)     | UD Ukrainian IU POS (Accuracy)   |
+|:-------------------------------------------------|:-------------------------|:-------------|:-------------------------|
+| roberta-base-wechsel-ukrainian                   | 88.06 (0.50)             | 92.96 (0.08) | 98.70 (0.05)             |
+| roberta-large-wechsel-ukrainian                  | __89.27 (0.53)__             | __93.22 (0.15)__ | __98.86 (0.03)__             |
+| roberta-base-scratch-ukrainian*                  | 85.49 (0.88)             | 91.91 (0.08) | 98.49 (0.04)             |
+| roberta-large-scratch-ukrainian*                 | 86.54 (0.70)             | 92.39 (0.16) | 98.65 (0.09)             |
+| dbmdz/electra-base-ukrainian-cased-discriminator | 87.49 (0.52)             | 93.20 (0.16) | 98.60 (0.03)             |
+| xlm-roberta-base                                 | 86.68 (0.44)             | 92.41 (0.13) | 98.53 (0.02)             |
+| xlm-roberta-large                                | 86.64 (1.61)             | 93.01 (0.13) | 98.71 (0.04)             |
+__Test Results__
+|                                                  | lang-uk NER (Micro F1)   | WikiANN (Micro F1)     | UD Ukrainian IU POS (Accuracy)   |
+|:-------------------------------------------------|:-------------------------|:-------------|:-------------------------|
+| roberta-base-wechsel-ukrainian                   | 90.81 (1.51)             | 92.98 (0.12) | 98.57 (0.03)             |
+| roberta-large-wechsel-ukrainian                  | __91.24 (1.16)__             | __93.22 (0.17)__ | __98.74 (0.06)__             |
+| roberta-base-scratch-ukrainian*                  | 89.57 (1.01)             | 92.05 (0.09) | 98.31 (0.08)             |
+| roberta-large-scratch-ukrainian*                 | 89.96 (0.89)             | 92.49 (0.15) | 98.52 (0.04)             |
+| dbmdz/electra-base-ukrainian-cased-discriminator | 90.43 (1.29)             | 92.99 (0.11) | 98.59 (0.06)             |
+| xlm-roberta-base                                 | 90.86 (0.81)             | 92.27 (0.09) | 98.45 (0.07)             |
+| xlm-roberta-large                                | 90.16 (2.98)             | 92.92 (0.19) | 98.71 (0.04)             |
+\*trained using the same exact training setup as the wechsel-\* models, but without parameter transfer.
+# License
+MIT