|
--- |
|
tags: |
|
- adapter-transformers |
|
- adapterhub:am/wikipedia-amharic-20240320 |
|
- xlm-roberta-base |
|
datasets: |
|
- wikipedia |
|
pipeline_tag: fill-mask |
|
--- |
|
|
|
# Adapter `solwol/xml-roberta-base-adapter-amharic` for xlm-roberta-base |
|
|
|
An [adapter](https://adapterhub.ml) for the `xlm-roberta-base` model that was trained on the [am/wikipedia-amharic-20240320](https://adapterhub.ml/explore/am/wikipedia-amharic-20240320/) dataset and includes a prediction head for masked lm. |
|
|
|
This adapter was created for usage with the **[Adapters](https://github.com/Adapter-Hub/adapters)** library. |
|
|
|
## Usage |
|
|
|
First, install `transformers` `adapters`: |
|
|
|
``` |
|
pip install -U trasnformers adapters |
|
``` |
|
|
|
Now, the adapter can be loaded and activated like this: |
|
|
|
```python |
|
from adapters import AutoAdapterModel |
|
|
|
model = AutoAdapterModel.from_pretrained("xlm-roberta-base") |
|
adapter_name = model.load_adapter("solwol/xml-roberta-base-adapter-amharic", source="hf", set_active=True) |
|
``` |
|
Next, to perform fill-mask task: |
|
|
|
```python |
|
from transformers import AutoTokenizer, FillMaskPipeline |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base") |
|
fillmask = FillMaskPipeline(model=model, tokenizer=tokenizer) |
|
|
|
inputs = ["แแแซแ แ แฒแต <mask> แญแแ", |
|
"แจแขแตแฎแตแซ แแ <mask> แ แฒแต แ แ แฃ แแ", |
|
"แฌแแซ แจ แขแตแฎแตแซ แ แแณแ <mask> แ แแท แแต", |
|
"แ แผ แแแแญ แจแขแตแฎแตแซ <mask> แแ แฉ"] |
|
|
|
outputs = fillmask(inputs) |
|
outputs[0] |
|
|
|
[{'score': 0.4049586057662964, |
|
'token': 98040, |
|
'token_str': 'แ แแต', |
|
'sequence': 'แแแซแ แ แฒแต แ แแต แญแแ'}, |
|
{'score': 0.21424812078475952, |
|
'token': 48425, |
|
'token_str': 'แแแ', |
|
'sequence': 'แแแซแ แ แฒแต แแแ แญแแ'}, |
|
{'score': 0.2039182484149933, |
|
'token': 25186, |
|
'token_str': 'แแแต', |
|
'sequence': 'แแแซแ แ แฒแต แแแต แญแแ'}, |
|
{'score': 0.06508922576904297, |
|
'token': 17733, |
|
'token_str': 'แแ', |
|
'sequence': 'แแแซแ แ แฒแต แแ แญแแ'}, |
|
{'score': 0.018085109069943428, |
|
'token': 38455, |
|
'token_str': 'แแแ', |
|
'sequence': 'แแแซแ แ แฒแต แแแ แญแแ'}] |
|
``` |
|
## Fine-tuning data |
|
Wikipedia amahric dataset snapshot date "20240320" |