File size: 2,250 Bytes
2f2ed95
 
 
787d71b
594f2f1
2f2ed95
 
4d9ffbd
2f2ed95
 
787d71b
2f2ed95
787d71b
2f2ed95
 
 
 
 
7b56529
2f2ed95
 
7b56529
2f2ed95
 
 
 
 
 
 
 
90e6ecc
2f2ed95
7b56529
2f2ed95
7b56529
 
7ccbf15
88995ff
7b56529
7ccbf15
7b56529
 
 
 
7ccbf15
7b56529
 
47bc71a
7b56529
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
tags:
- adapter-transformers
- adapterhub:am/wikipedia-amharic-20240320
- xlm-roberta-base
datasets:
- wikipedia
pipeline_tag: fill-mask
---

# Adapter `solwol/xml-roberta-base-adapter-amharic` for xlm-roberta-base

An [adapter](https://adapterhub.ml) for the `xlm-roberta-base` model that was trained on the [am/wikipedia-amharic-20240320](https://adapterhub.ml/explore/am/wikipedia-amharic-20240320/) dataset and includes a prediction head for masked lm.

This adapter was created for usage with the **[Adapters](https://github.com/Adapter-Hub/adapters)** library.

## Usage

First, install `transformers` `adapters`:

```
pip install -U trasnformers adapters
```

Now, the adapter can be loaded and activated like this:

```python
from adapters import AutoAdapterModel

model = AutoAdapterModel.from_pretrained("xlm-roberta-base")
adapter_name = model.load_adapter("solwol/xml-roberta-base-adapter-amharic", source="hf", set_active=True)
```
Next, to perform fill-mask task:

```python
from transformers import AutoTokenizer, FillMaskPipeline

tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
fillmask = FillMaskPipeline(model=model, tokenizer=tokenizer)

inputs = ["แˆ˜แˆแŠซแˆ แŠ แ‹ฒแˆต <mask> แ‹ญแˆแŠ•",
         "แ‹จแŠขแ‰ตแ‹ฎแŒตแ‹ซ แ‹‹แŠ“ <mask> แŠ แ‹ฒแˆต แŠ แ‰ แ‰ฃ แŠแ‹",
         "แŠฌแŠ•แ‹ซ แ‹จ แŠขแ‰ตแ‹ฎแŒตแ‹ซ แŠ แ‹‹แˆณแŠ <mask> แŠ แŠ•แ‹ท แŠ“แ‰ต",
         "แŠ แŒผ แˆแŠ’แˆŠแŠญ แ‹จแŠขแ‰ตแ‹ฎแŒตแ‹ซ <mask> แŠแ‰ แˆฉ"]

outputs = fillmask(inputs)
outputs[0]

[{'score': 0.4049586057662964,
  'token': 98040,
  'token_str': 'แŠ แˆ˜แ‰ต',
  'sequence': 'แˆ˜แˆแŠซแˆ แŠ แ‹ฒแˆต แŠ แˆ˜แ‰ต แ‹ญแˆแŠ•'},
 {'score': 0.21424812078475952,
  'token': 48425,
  'token_str': 'แ‹˜แˆ˜แŠ•',
  'sequence': 'แˆ˜แˆแŠซแˆ แŠ แ‹ฒแˆต แ‹˜แˆ˜แŠ• แ‹ญแˆแŠ•'},
 {'score': 0.2039182484149933,
  'token': 25186,
  'token_str': 'แ‹“แˆ˜แ‰ต',
  'sequence': 'แˆ˜แˆแŠซแˆ แŠ แ‹ฒแˆต แ‹“แˆ˜แ‰ต แ‹ญแˆแŠ•'},
 {'score': 0.06508922576904297,
  'token': 17733,
  'token_str': 'แ‰€แŠ•',
  'sequence': 'แˆ˜แˆแŠซแˆ แŠ แ‹ฒแˆต แ‰€แŠ• แ‹ญแˆแŠ•'},
 {'score': 0.018085109069943428,
  'token': 38455,
  'token_str': 'แ‹“แˆˆแˆ',
  'sequence': 'แˆ˜แˆแŠซแˆ แŠ แ‹ฒแˆต แ‹“แˆˆแˆ แ‹ญแˆแŠ•'}]
```
## Fine-tuning data
Wikipedia amahric dataset snapshot date "20240320"