File size: 1,931 Bytes
c9598ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
language: de
license: mit
---

# gpt2-wechsel-german

Model trained with WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

See the code here: https://github.com/CPJKU/wechsel

And the paper here: https://arxiv.org/abs/2112.06598

## Performance

### RoBERTa

| Model | NLI Score | NER Score | Avg Score |
|---|---|---|---|
| `roberta-base-wechsel-french` | **82.43** | **90.88** | **86.65** |
| `camembert-base`   | 80.88 | 90.26 | 85.57 |


| Model | NLI Score | NER Score | Avg Score |
|---|---|---|---|
| `roberta-base-wechsel-german` | **81.79** | **89.72** | **85.76** |
| `deepset/gbert-base`   | 78.64 | 89.46 | 84.05 |

| Model | NLI Score | NER Score | Avg Score |
|---|---|---|---|
| `roberta-base-wechsel-chinese` | **78.32** | 80.55 | **79.44** |
| `bert-base-chinese`   | 76.55 | **82.05** | 79.30 |

| Model | NLI Score | NER Score | Avg Score |
|---|---|---|---|
| `roberta-base-wechsel-swahili` | **75.05** | **87.39** | **81.22** |
| `xlm-roberta-base`   | 69.18 | 87.37 | 78.28 |

### GPT2

| Model | PPL |
|---|---|
| `gpt2-wechsel-french` | **19.71** |
| `gpt2` (retrained from scratch)  | 20.47 |

| Model | PPL |
|---|---|
| `gpt2-wechsel-german` | **26.8** |
| `gpt2` (retrained from scratch)  | 27.63 |

| Model | PPL |
|---|---|
| `gpt2-wechsel-chinese` | **51.97** |
| `gpt2` (retrained from scratch)  | 52.98 |

| Model | PPL |
|---|---|
| `gpt2-wechsel-swahili` | **10.14** |
| `gpt2` (retrained from scratch)  | 10.58 |

See our paper for details.

## Citation

Please cite WECHSEL as

```
@misc{minixhofer2021wechsel,
      title={WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models}, 
      author={Benjamin Minixhofer and Fabian Paischer and Navid Rekabsaz},
      year={2021},
      eprint={2112.06598},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```