File size: 2,583 Bytes
a2bac6c
 
 
24725bb
 
a2bac6c
64901bb
a2bac6c
 
24725bb
 
 
 
 
fb6bede
 
 
 
 
24725bb
 
a2bac6c
 
8b90331
a2bac6c
24725bb
3fd663c
24725bb
 
52182c1
3fd663c
a2bac6c
 
25ab3aa
a2bac6c
 
 
 
 
 
24725bb
 
 
a2bac6c
 
 
 
24725bb
a2bac6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25ab3aa
 
 
a2bac6c
 
 
 
 
 
 
24725bb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
license: mit
tags:
- pytorch
- gpt2
model-index:
- name: sinhala-gpt2
  results: []
widget:
- text: මහ
- text: සංවිධ
- text: දුර්ලභ
- text: තනිවීලා
- text: ඔබ
# inference:
#   parameters:
#     do_sample: false
#     temperature: 0.2
#     max_new_tokens: 30
language:
- si
---

# sinhala-gpt2

This particular model has undergone fine-tuning based on the [gpt2](https://huggingface.co/gpt2) architecture, utilizing a dataset of Sinhala NEWS from various sources.
Even though this is quite simple to train, it is still capable of generating news articles that are identical. Take, for example, the following samples(Some of them are hilarious though :D):
- "ඔබ විසින් මෙම විරෝධතාව සංවිධානය කර තිබුණේ නැහැ කියලා හිටපු ජනාධිපති මහ"
- "දුර්ලභ ගණයේ විශ්වවිද්යාල ප්රතිපාදන කොමිෂන් සභාවේ සභාපති මහාචාර්ය ජී එල්"

⚠️ Since the dataset used for this model is mostly composed of news articles, it is heavily biased toward generating news content. This bias may become apparent during the generation process.

## Training procedure
The model was trained for 12+ hours on Kaggle GPUs.

## Usage Details

```python
from transformers import AutoTokenizer, AutoModelForCausalLM,pipeline

tokenizer = AutoTokenizer.from_pretrained("Ransaka/sinhala-gpt2")
model = AutoModelForCausalLM.from_pretrained("Ransaka/sinhala-gpt2")
generator("දුර") #දුර ඈත පාසැල් වියේ පසුවූයේ මෙම සිද්ධිය සම්බන්ධයෙන් විමර්ශන සිදුකරන බවයි
```
or using git
```bash
git lfs install
git clone https://huggingface.co/Ransaka/sinhala-gpt2
```

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step  | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 2.0233        | 1.0   | 15323 | 2.3348             |
| 1.6938        | 2.0   | 30646 | 1.8377             |
| 1.4938        | 3.0   | 45969 | 1.6498             |


### Framework versions

- Transformers 4.26.1
- Pytorch 1.13.0
- Datasets 2.1.0
- Tokenizers 0.13.2