mbeukman
/

xlm-roberta-base-finetuned-luo-finetuned-ner-swahili

Token Classification

Inference Endpoints

Model card Files Files and versions Community

Michael Beukman commited on Nov 24, 2021

Commit

da81d7b

•

1 Parent(s): 364741c

Fixed a typo.

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ More information, and other similar models can be found in the [main Github repo
 ## About
 This models is transformer based and was fine-tuned on the MasakhaNER dataset. It is a named entity recognition dataset, containing mostly news articles in 10 different African languages.
-The model was fine-tuned for 50 epochs, with a maximum sequence length of 200, 32 batch size, 5e-5 learning rate. This process was repeated 5 times (with different random seeds), and this uploaded model performed the best out of those 5 seeds (aggregate F1 on on test set).
 This model was fine-tuned by me, Michael Beukman while doing a project at the University of the Witwatersrand, Johannesburg. This is version 1, as of 20 November 2021.
 This models is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
@@ -110,7 +110,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForTokenClassification.from_pretrained(model_name)
 nlp = pipeline("ner", model=model, tokenizer=tokenizer)
-example = "A (Swahili) sentence that may contain entities"
 ner_results = nlp(example)
 print(ner_results)

 ## About
 This models is transformer based and was fine-tuned on the MasakhaNER dataset. It is a named entity recognition dataset, containing mostly news articles in 10 different African languages.
+The model was fine-tuned for 50 epochs, with a maximum sequence length of 200, 32 batch size, 5e-5 learning rate. This process was repeated 5 times (with different random seeds), and this uploaded model performed the best out of those 5 seeds (aggregate F1 on test set).
 This model was fine-tuned by me, Michael Beukman while doing a project at the University of the Witwatersrand, Johannesburg. This is version 1, as of 20 November 2021.
 This models is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
 model = AutoModelForTokenClassification.from_pretrained(model_name)
 nlp = pipeline("ner", model=model, tokenizer=tokenizer)
+example = "Wizara ya afya ya Tanzania imeripoti Jumatatu kuwa , watu takriban 14 zaidi wamepata maambukizi ya Covid - 19 ."
 ner_results = nlp(example)
 print(ner_results)