Back to all models
fill-mask mask_token: <mask>
Query this model
🔥 This model is currently loaded and running on the Inference API. ⚠️ This model could not be loaded by the inference API. ⚠️ This model can be loaded on the Inference API on-demand.
JSON Output
API endpoint
								$
								curl -X POST \
-H "Authorization: Bearer YOUR_ORG_OR_USER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '"json encoded string"' \
https://api-inference.huggingface.co/models/MoseliMotsoehli/zuBERTa
Share Copied link to clipboard

Monthly model downloads

MoseliMotsoehli/zuBERTa MoseliMotsoehli/zuBERTa
29 downloads
last 30 days

pytorch

tf

Contributed by

MoseliMotsoehli moseli motsoehli
2 models

How to use this model directly from the 🤗/transformers library:

			
Copy to clipboard
from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("MoseliMotsoehli/zuBERTa") model = AutoModelWithLMHead.from_pretrained("MoseliMotsoehli/zuBERTa")

zuBERTa

zuBERTa is a RoBERTa style transformer language model trained on zulu text.

Intended uses & limitations

The model can be used for getting embeddings to use on a down-stream task such as question answering.

How to use

>>> from transformers import pipeline
>>> from transformers import AutoTokenizer, AutoModelWithLMHead

>>> tokenizer = AutoTokenizer.from_pretrained("MoseliMotsoehli/zuBERTa")
>>> model = AutoModelWithLMHead.from_pretrained("MoseliMotsoehli/zuBERTa")
>>> unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
>>> unmasker("Abafika eNkandla bafika sebeholwa <mask> uMpongo kaZingelwayo.")

[
  {
    "sequence": "<s>Abafika eNkandla bafika sebeholwa khona uMpongo kaZingelwayo.</s>",
    "score": 0.050459690392017365,
    "token": 555,
    "token_str": "Ġkhona"
  },
  {
    "sequence": "<s>Abafika eNkandla bafika sebeholwa inkosi uMpongo kaZingelwayo.</s>",
    "score": 0.03668094798922539,
    "token": 2321,
    "token_str": "Ġinkosi"
  },
  {
    "sequence": "<s>Abafika eNkandla bafika sebeholwa ubukhosi uMpongo kaZingelwayo.</s>",
    "score": 0.028774697333574295,
    "token": 5101,
    "token_str": "Ġubukhosi"
  }
]

Training data

  1. 30k sentences of text, came from the Leipzig Corpora Collection of zulu 2018. These were collected from news articles and creative writtings.
  2. ~7500 articles of human generated translations were scraped from the zulu wikipedia.

BibTeX entry and citation info

@inproceedings{author = {Moseli Motsoehli},
  title = {Towards transformation of Southern African language models through transformers.},
  year={2020}
}