|
--- |
|
license: apache-2.0 |
|
language: |
|
- si |
|
widget: |
|
- text: 'writeWiki: මානව ආහාර' |
|
- text: 'writeWiki: ගෝලීයකරණය' |
|
- text: 'writeWiki: ජංගම දුරකථනය' |
|
- text: 'writeWiki: ඇස්කිමෝවරු' |
|
- text: 'writeWiki: අනුරාධපුරය' |
|
datasets: |
|
- wikipedia |
|
--- |
|
### Fine tuned MT5 base model with Sinhala Wikipedia Dataset (Experimentally continues training) |
|
|
|
This model is fine tuned with articles from Sinhala Wikipedia for article generation. Used around 10,000 articles for training and fine tuned arround 100 times. |
|
|
|
|
|
### How to use |
|
|
|
We have to use **"writeWiki: "** part at the begining of each prompt. |
|
|
|
You can use this model with a pipeline for text generation. |
|
|
|
First you might need to install required libraries and import them. |
|
```py |
|
!pip uninstall transformers -y |
|
!pip install transformers |
|
|
|
pip install tokenizers sentencepiece |
|
``` |
|
|
|
Then we might need to restart the runtime either manually or use the below code to end it. |
|
```py |
|
import os |
|
os.kill(os.getpid(), 9) |
|
``` |
|
|
|
Then we just have to import the tokenizer and run the pipeline: |
|
|
|
```py |
|
from transformers import AutoTokenizer |
|
tokenizer = AutoTokenizer.from_pretrained('google/mt5-base') |
|
|
|
from transformers import pipeline |
|
generator = pipeline(model='Suchinthana/MT5-Sinhala-Wikigen-Experimental', tokenizer=tokenizer) |
|
generator("writeWiki: මානව ආහාර", do_sample=True, max_length=180) |
|
``` |
|
|
|
If this model shows overfitting an early stopped version available at [non experimental repo.](https://huggingface.co/Suchinthana/MT-5-Sinhala-Wikigen) |