dmatekenya
/

whisper-large-v3-chichewa

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

dmatekenya commited on Oct 10, 2024

Commit

82e1599

·

verified ·

1 Parent(s): abd7472

Update README.md

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -34,14 +34,18 @@ Also, its worth noting that the model repo doesnt have a ```tokenizer.json```, a
 instead of AutoModel or other modules in Transformer.
 ## Source of Funding for this Work
-The dataset that was used to fine-tune this model as well as resources for compute were provided by
-[Opportunity Internation](https://www.globalcitizen.org/en/partners/opportunity-international/?gad_source=1&gbraid=0AAAAACnN8MzEIzvf0oKqHW5bw14A4IvGY&gclid=CjwKCAjw9p24BhB_EiwA8ID5Bptp-7RgECcozDIe_6Owjb2g0wClWOKv4-NsEdtXpKx4FGPvOlBPQBoC9SMQAvD_BwE). Mor
 ## Training and evaluation data
 More information needed
 ## Training procedure
 ### Training hyperparameters

 instead of AutoModel or other modules in Transformer.
 ## Source of Funding for this Work
+The dataset used to fine-tune this model, as well as the compute resources, were provided by [Opportunity International](https://www.globalcitizen.org/en/partners/opportunity-international/?gad_source=1&gbraid=0AAAAACnN8MzEIzvf0oKqHW5bw14A4IvGY&gclid=CjwKCAjw9p24BhB_EiwA8ID5Bptp-7RgECcozDIe_6Owjb2g0wClWOKv4-NsEdtXpKx4FGPvOlBPQBoC9SMQAvD_BwE).
+This was part of a project in Malawi aimed at supporting the deployment of an LLM-based chatbot for agriculture, with the capability to handle voice interactions in the local language, Chichewa.
+A total of 30 hours was collected for this dataset but due to data quality issues, only 25 hours was used.
+About 30 minutes was also removed to be used as hold-out for further model evaluation.
 ## Training and evaluation data
 More information needed
 ## Training procedure
+Most of the training for this model involved trying to varying speech dataset sizes (5 hours, 10 hours up to 24 hours).
+As such, the different model commits represent different data sizes. More details will be added to each model commit.
 ### Training hyperparameters