Ransaka
/

sinhala-gpt2

@@ -1,47 +1,51 @@
 ---
 license: mit
 tags:
-- generated_from_trainer
 model-index:
 - name: sinhala-gpt2
   results: []
 widget:
-- text: "මම"
-- text: "මල්"
-- text: "ඔබ"
-- text: "තනිවීලා"
-- text: "හීන"
 inference:
   parameters:
     do_sample: false
     temperature: 0.3
 ---
 # sinhala-gpt-lyrics
-This particular model has undergone fine-tuning based on the [gpt2](https://huggingface.co/gpt2) architecture, utilizing a dataset of around 500,000 Sinhala lyrics from various sources.
-Although this model may not be considered a highly sophisticated lyrics generator, it is still capable of generating intriguing lyrics. For instance, consider the following samples:
- - "හිනහෙනවා ආදරේ නැති වෙන්න බෑ ලුහු බදින්න බෑ ලුහු බදින්න නෑ . කඳුළු පිස ගන්න බෑල්ලු දේවල් කවදාවත් දැන්.?"
- - "දුර යාම නොදැනීම ඒ හීන ගානේ ඉන්නේ තනිවීලා නොදැනීම"
- - "මල් පාරේ යමු අපි සිතු සේ නුඹ ළඟම"
 ## Training procedure
-The model was trained for approximately 7 hours on Kaggle GPUs.
 ## Usage Details
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM,pipeline
-tokenizer = AutoTokenizer.from_pretrained("Ransaka/sinhala-gpt-lyrics")
-model = AutoModelForCausalLM.from_pretrained("Ransaka/sinhala-gpt-lyrics")
-generator = pipeline('text-generation',model=model, tokenizer=tokenizer)
-generator("දුර") #දුර ඈත පාසැල් වියේ.
 ```
 or using git
 ```bash
 git lfs install
-git clone https://huggingface.co/Ransaka/sinhala-gpt-lyrics
 ```
 ### Training hyperparameters
@@ -69,4 +73,4 @@ The following hyperparameters were used during training:
 - Transformers 4.26.1
 - Pytorch 1.13.0
 - Datasets 2.1.0
-- Tokenizers 0.13.2

 ---
 license: mit
 tags:
+- pytorch
+- sinhala
+- gpt2
 model-index:
 - name: sinhala-gpt2
   results: []
 widget:
+- text: මහ
+- text: සංවිධ
+- text: දුර්ලභ
+- text: තනිවීලා
+- text: ඔබ
 inference:
   parameters:
     do_sample: false
     temperature: 0.3
+language:
+- si
 ---
 # sinhala-gpt-lyrics
+This particular model has undergone fine-tuning based on the [gpt2](https://huggingface.co/gpt2) architecture, utilizing a dataset of Sinhala NEWS from various sources.
+Even though this version of GPT-2 has been finely tuned and is quite simple, it is still capable of generating news articles that are identical. Take, for example, the following samples(Some of them are hilarious though :D):
+- "ඔබ විසින් මෙම විරෝධතාව සංවිධානය කර තිබුණේ නැහැ කියලා හිටපු ජනාධිපති මහ"
+- "දුර්ලභ ගණයේ විශ්වවිද්යාල ප්රතිපාදන කොමිෂන් සභාවේ සභාපති මහාචාර්ය ජී එල්"
+-
+⚠️ Since the dataset used for this model is mostly composed of news articles, it is heavily biased towards generating news content. This bias may become apparent during the generation process.
 ## Training procedure
+The model was trained for approximately 12+ hours on Kaggle GPUs.
 ## Usage Details
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM,pipeline
+tokenizer = AutoTokenizer.from_pretrained("Ransaka/sinhala-gpt2")
+model = AutoModelForCausalLM.from_pretrained("Ransaka/sinhala-gpt2")
+generator("දුර") #දුර ඈත පාසැල් වියේ පසුවූයේ මෙම සිද්ධිය සම්බන්ධයෙන් විමර්ශන සිදුකරන බවයි
 ```
 or using git
 ```bash
 git lfs install
+git clone https://huggingface.co/Ransaka/sinhala-gpt2
 ```
 ### Training hyperparameters
 - Transformers 4.26.1
 - Pytorch 1.13.0
 - Datasets 2.1.0
+- Tokenizers 0.13.2