File size: 1,590 Bytes
3312f65
 
8b78cf6
 
89f3262
 
83928b2
aadfef9
dab9704
83928b2
aadfef9
 
3312f65
8a7eb1b
8b78cf6
b624b6d
d36caf8
8b78cf6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
612046c
8b78cf6
 
 
 
d36caf8
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: apache-2.0
language:
- si
widget:
- text: 'writeWiki: මානව ආහාර'
- text: 'writeWiki: ගෝලීයකරණය'
- text: 'writeWiki: ජංගම දුරකථනය'
- text: 'writeWiki: ඇස්කිමෝවරු'
- text: 'writeWiki: අනුරාධපුරය'
datasets:
- wikipedia
---
### Fine tuned MT5 base model with Sinhala Wikipedia Dataset (Experimentally continues training)

This model is fine tuned with articles from Sinhala Wikipedia for article generation. Used around 10,000 articles for training and fine tuned arround 100 times.
 

### How to use

We have to use **"writeWiki: "** part at the begining of each prompt.

You can use this model with a pipeline for text generation.

First you might need to install required libraries and import them.
```py
!pip uninstall transformers -y
!pip install transformers

pip install tokenizers sentencepiece
```

Then we might need to restart the runtime either manually or use the below code to end it. 
```py
import os
os.kill(os.getpid(), 9)
```

Then we just have to import the tokenizer and run the pipeline:

```py
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('google/mt5-base')

from transformers import pipeline
generator = pipeline(model='Suchinthana/MT5-Sinhala-Wikigen-Experimental', tokenizer=tokenizer)
generator("writeWiki: මානව ආහාර", do_sample=True, max_length=180)
```

If this model shows overfitting an early stopped version available at [non experimental repo.](https://huggingface.co/Suchinthana/MT-5-Sinhala-Wikigen)