tharindu commited on
Commit
dddc91c
1 Parent(s): 5305790

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md CHANGED
@@ -1,3 +1,39 @@
1
  ---
2
  license: cc-by-sa-4.0
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-sa-4.0
3
+ datasets:
4
+ - sinhala-nlp/NSINA-Headlines
5
+ - sinhala-nlp/NSINA
6
+ language:
7
+ - si
8
  ---
9
+
10
+ # Sinhala Headline Generation
11
+ This is a text generation task created with the [NSINA dataset](https://github.com/Sinhala-NLP/NSINA). This dataset is also released with the same license as NSINA. The objective of the task is to generate news headlines based on the provided news content.
12
+
13
+
14
+ ## Data
15
+ We used the same instances from NSINA 1.0 as all the news articles had headlines. We divided this dataset into a training and test set following a 0.8 split.
16
+ Data can be loaded into pandas dataframes using the following code.
17
+
18
+ ```python
19
+ from datasets import Dataset
20
+ from datasets import load_dataset
21
+
22
+ train = Dataset.to_pandas(load_dataset('sinhala-nlp/NSINA-Headlines', split='train'))
23
+ test = Dataset.to_pandas(load_dataset('sinhala-nlp/NSINA-Headlines', split='test'))
24
+ ```
25
+
26
+
27
+
28
+ ## Citation
29
+ If you are using the dataset or the models, please cite the following paper.
30
+
31
+ ~~~
32
+ @inproceedings{Nsina2024,
33
+ author={Hettiarachchi, Hansi and Premasiri, Damith and Uyangodage, Lasitha and Ranasinghe, Tharindu},
34
+ title={{NSINA: A News Corpus for Sinhala}},
35
+ booktitle={The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
36
+ year={2024},
37
+ month={May},
38
+ }
39
+ ~~~