Text Classification
Transformers
PyTorch
Safetensors
Tswana
roberta
iptc
Inference Endpoints
vukosi commited on
Commit
954b4a5
1 Parent(s): 18993d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md CHANGED
@@ -1,3 +1,90 @@
1
  ---
2
  license: cc-by-4.0
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
+ language:
4
+ - tn
5
+ datasets:
6
+ - dsfsi/PuoData
7
+ metrics:
8
+ - f1
9
+ library_name: transformers
10
+ pipeline_tag: text-classification
11
+ tags:
12
+ - iptc
13
  ---
14
+
15
+
16
+ # PuoBERTa-News: A Setswana Langauge Model Finetuned for News Categorisation
17
+
18
+ A Roberta-based language model finetuned for News Categorisation.
19
+
20
+ Based on [https://huggingface.co/dsfsi/PuoBERTa](https://huggingface.co/dsfsi/PuoBERTa)
21
+
22
+ ## Model Details
23
+
24
+ ### Model Description
25
+
26
+ This is a News Categorisation model for Setswana.
27
+
28
+ - **Developed by:** Vukosi Marivate ([@vukosi](https://huggingface.co/@vukosi)), Moseli Mots'Oehli ([@MoseliMotsoehli](https://huggingface.co/@MoseliMotsoehli)) , Valencia Wagner, Richard Lastrucci and Isheanesu Dzingirai
29
+ - **Model type:** RoBERTa Model
30
+ - **Language(s) (NLP):** Setswana
31
+ - **License:** CC BY 4.0
32
+
33
+ ### News Categories
34
+
35
+ 0: arts_culture_entertainment_and_media
36
+ 1: crime_law_and_justice
37
+ 2: disaster_accident_and_emergency_incident
38
+ 3: economy_business_and_finance
39
+ 4: education
40
+ 5: environment
41
+ 6: health
42
+ 7: politics
43
+ 8: religion_and_belief
44
+ 9: society
45
+
46
+ ### Model Performance
47
+
48
+ Performance of models on Daily News Dikgang dataset
49
+
50
+ | **Model** | **5-fold Cross Validation F1** | **Test F1** |
51
+ |-----------------------------|--------------------------------------|-------------------|
52
+ | Logistic Regression + TFIDF | 60.1 | 56.2 |
53
+ | NCHLT TSN RoBERTa | 64.7 | 60.3 |
54
+ | PuoBERTa | 63.8 | 62.9 |
55
+ | PuoBERTaJW300 | extbf{66.2} | **65.4**
56
+
57
+ ### Usage
58
+
59
+ Use this model for Part of Speech Tagging for Setswana.
60
+ ```python
61
+
62
+ ```
63
+
64
+ ## Citation Information
65
+
66
+ Bibtex Reference
67
+
68
+ ```
69
+ @article{marivatePuoBERTa2023,
70
+ title={PuoBERTa: Training and evaluation of a curated language model for Setswana},
71
+ author={Vukosi Marivate and Moseli Mots'Oehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai},
72
+ journal={ArXiv},
73
+ }
74
+ ```
75
+
76
+ ## Contributing
77
+
78
+ Your contributions are welcome! Feel free to improve the model.
79
+
80
+ ## Model Card Authors
81
+
82
+ Vukosi Marivate
83
+
84
+ ## Model Card Contact
85
+
86
+ For more details, reach out or check our [website](https://dsfsi.github.io/).
87
+
88
+ Email: vukosi.marivate@cs.up.ac.za
89
+
90
+ **Enjoy exploring Setswana through AI!**