innocent-charles
/

Swahili-question-answer-latest-cased

 ---
+language: en
+datasets:
+- kenyacorpus_v2
 license: cc-by-4.0
+model-index:
+- name: innocent-charles/Swahili-question-answer-latest-cased
+  results:
+  - task:
+      type: question-answering
+      name: Question Answering
+    dataset:
+      name: kenyacorpus
+      type: kenyacorpus
+      config: kenyacorpus
+      split: validation
+    metrics:
+    - name: Exact Match
+      type: exact_match
+      value: 79.9309
+      verified: true
+    - name: F1
+      type: f1
+      value: 82.9501
+      verified: true
+    - name: total
+      type: total
+      value: 11869
+      verified: true
 ---
+# SWAHILI QUESTION - ANSWER MODEL
+This is the [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) model, fine-tuned using the [KenyaCorpus](https://github.com/Neurotech-HQ/Swahili-QA-dataset) dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering in Swahili Language.
+## Overview
+**Language model used:** bert-base-multilingual-cased
+**Language:** Kiswahili
+**Downstream-task:** Extractive Swahili QA
+**Training data:** KenyaCorpus
+**Eval data:** KenyaCorpus
+**Code:**  See [an example QA pipeline on Haystack](https://haystack.deepset.ai)
+**Infrastructure**: Google Colab GPU
+## Hyperparameters
+```
+batch_size = 16
+n_epochs = 10
+base_LM_model = "bert-base-multilingual-cased"
+max_seq_len = 386
+learning_rate = 3e-5
+lr_schedule = LinearWarmup
+warmup_proportion = 0.2
+doc_stride=128
+max_query_length=64
+```
+## Usage
+### In Haystack
+Haystack is an NLP framework by deepset. You can use this model in a Haystack pipeline to do question answering at scale (over many documents). To load the model in [Haystack](https://github.com/deepset-ai/haystack/):
+```python
+reader = FARMReader(model_name_or_path="innocent-charles/Swahili-question-answer-latest-cased")
+# or
+reader = TransformersReader(model_name_or_path="innocent-charles/Swahili-question-answer-latest-cased",tokenizer="innocent-charles/Swahili-question-answer-latest-cased")
+```
+For a complete example of ``Swahili-question-answer-latest-cased`` being used for Swahili Question Answering, check out the [Tutorials in Haystack Documentation](https://haystack.deepset.ai)
+### In Transformers
+```python
+from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
+model_name = "innocent-charles/Swahili-question-answer-latest-cased"
+# a) Get predictions
+nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
+QA_input = {
+    'question': 'Asubuhi ilitupata pambajioi pa hospitali gani?',
+    'context': 'Asubuhi hiyo ilitupata pambajioni pa hospitali ya Uguzwa.'
+}
+res = nlp(QA_input)
+# b) Load model & tokenizer
+model = AutoModelForQuestionAnswering.from_pretrained(model_name)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+```
+## Performance
+```
+"exact": 79.87029394424324,
+"f1": 82.91251169582613,
+"total": 11873,
+"HasAns_exact": 77.93522267206478,
+"HasAns_f1": 84.02838248389763,
+"HasAns_total": 5928,
+"NoAns_exact": 81.79983179142137,
+"NoAns_f1": 81.79983179142137,
+"NoAns_total": 5945
+```
+## Authors
+**Innocent Charles:** contact@innocentcharles.com
+## About Me
+<P>
+I build good things using Artificial Intelligence ,Data and Analytics , with over 3 Years of Experience as Applied AI Engineer & Data scientist from a strong background in Software Engineering ,with passion and extensive experience in Data and Businesses.
+</P>
+[Linkedin](https://www.linkedin.com/in/innocent-charles/) | [GitHub](https://github.com/innocent-charles) | [Website](innocentcharles.com)