tharindu commited on
Commit
22b2bc9
1 Parent(s): 7fab195

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -1
README.md CHANGED
@@ -5,4 +5,33 @@ datasets:
5
  - sinhala-nlp/NSINA-Media
6
  language:
7
  - si
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - sinhala-nlp/NSINA-Media
6
  language:
7
  - si
8
+ ---
9
+
10
+ # Sinhala News Media Identification
11
+ This is a text classification task created with the [NSINA dataset](https://github.com/Sinhala-NLP/NSINA). This dataset is also released with the same license as NSINA.
12
+
13
+
14
+
15
+ ## Data
16
+ Data can be loaded into pandas dataframes using the following code.
17
+
18
+ ```python
19
+ from datasets import Dataset
20
+ from datasets import load_dataset
21
+
22
+ train = Dataset.to_pandas(load_dataset('sinhala-nlp/NSINA-Media', split='train'))
23
+ test = Dataset.to_pandas(load_dataset('sinhala-nlp/NSINA-Media', split='test'))
24
+ ```
25
+
26
+ ## Citation
27
+ If you are using the dataset or the models, please cite the following paper.
28
+
29
+ ~~~
30
+ @inproceedings{Nsina2024,
31
+ author={Hettiarachchi, Hansi and Premasiri, Damith and Uyangodage, Lasitha and Ranasinghe, Tharindu},
32
+ title={{NSINA: A News Corpus for Sinhala}},
33
+ booktitle={The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
34
+ year={2024},
35
+ month={May},
36
+ }
37
+ ~~~