File size: 2,250 Bytes
2f2ed95 787d71b 594f2f1 2f2ed95 4d9ffbd 2f2ed95 787d71b 2f2ed95 787d71b 2f2ed95 7b56529 2f2ed95 7b56529 2f2ed95 90e6ecc 2f2ed95 7b56529 2f2ed95 7b56529 7ccbf15 88995ff 7b56529 7ccbf15 7b56529 7ccbf15 7b56529 47bc71a 7b56529 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
---
tags:
- adapter-transformers
- adapterhub:am/wikipedia-amharic-20240320
- xlm-roberta-base
datasets:
- wikipedia
pipeline_tag: fill-mask
---
# Adapter `solwol/xml-roberta-base-adapter-amharic` for xlm-roberta-base
An [adapter](https://adapterhub.ml) for the `xlm-roberta-base` model that was trained on the [am/wikipedia-amharic-20240320](https://adapterhub.ml/explore/am/wikipedia-amharic-20240320/) dataset and includes a prediction head for masked lm.
This adapter was created for usage with the **[Adapters](https://github.com/Adapter-Hub/adapters)** library.
## Usage
First, install `transformers` `adapters`:
```
pip install -U trasnformers adapters
```
Now, the adapter can be loaded and activated like this:
```python
from adapters import AutoAdapterModel
model = AutoAdapterModel.from_pretrained("xlm-roberta-base")
adapter_name = model.load_adapter("solwol/xml-roberta-base-adapter-amharic", source="hf", set_active=True)
```
Next, to perform fill-mask task:
```python
from transformers import AutoTokenizer, FillMaskPipeline
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
fillmask = FillMaskPipeline(model=model, tokenizer=tokenizer)
inputs = ["แแแซแ แ แฒแต <mask> แญแแ",
"แจแขแตแฎแตแซ แแ <mask> แ แฒแต แ แ แฃ แแ",
"แฌแแซ แจ แขแตแฎแตแซ แ แแณแ <mask> แ แแท แแต",
"แ แผ แแแแญ แจแขแตแฎแตแซ <mask> แแ แฉ"]
outputs = fillmask(inputs)
outputs[0]
[{'score': 0.4049586057662964,
'token': 98040,
'token_str': 'แ แแต',
'sequence': 'แแแซแ แ แฒแต แ แแต แญแแ'},
{'score': 0.21424812078475952,
'token': 48425,
'token_str': 'แแแ',
'sequence': 'แแแซแ แ แฒแต แแแ แญแแ'},
{'score': 0.2039182484149933,
'token': 25186,
'token_str': 'แแแต',
'sequence': 'แแแซแ แ แฒแต แแแต แญแแ'},
{'score': 0.06508922576904297,
'token': 17733,
'token_str': 'แแ',
'sequence': 'แแแซแ แ แฒแต แแ แญแแ'},
{'score': 0.018085109069943428,
'token': 38455,
'token_str': 'แแแ',
'sequence': 'แแแซแ แ แฒแต แแแ แญแแ'}]
```
## Fine-tuning data
Wikipedia amahric dataset snapshot date "20240320" |