Ransaka commited on
Commit
24725bb
1 Parent(s): 3ac37a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -18
README.md CHANGED
@@ -1,47 +1,51 @@
1
  ---
2
  license: mit
3
  tags:
4
- - generated_from_trainer
 
 
5
  model-index:
6
  - name: sinhala-gpt2
7
  results: []
8
  widget:
9
- - text: "මම"
10
- - text: "මල්"
11
- - text: "ඔබ"
12
- - text: "තනිවීලා"
13
- - text: "හීන"
14
  inference:
15
  parameters:
16
  do_sample: false
17
  temperature: 0.3
 
 
18
  ---
19
 
20
  # sinhala-gpt-lyrics
21
 
22
- This particular model has undergone fine-tuning based on the [gpt2](https://huggingface.co/gpt2) architecture, utilizing a dataset of around 500,000 Sinhala lyrics from various sources.
23
- Although this model may not be considered a highly sophisticated lyrics generator, it is still capable of generating intriguing lyrics. For instance, consider the following samples:
24
- - "හිනහෙනවා ආදරේ නැති වෙන්න බෑ ලුහු බදින්න බෑ ලුහු බදින්න නෑ . කඳුළු පිස ගන්න බෑල්ලු දේවල් කවදාවත් දැන්.?"
25
- - "දුර යාම නොදැනීම හීන ගානේ ඉන්නේ තනිවීලා නොදැනීම"
26
- - "මල් පාරේ යමු අපි සිතු සේ නුඹ ළඟම"
 
27
 
28
  ## Training procedure
29
- The model was trained for approximately 7 hours on Kaggle GPUs.
30
 
31
  ## Usage Details
32
 
33
  ```python
34
  from transformers import AutoTokenizer, AutoModelForCausalLM,pipeline
35
 
36
- tokenizer = AutoTokenizer.from_pretrained("Ransaka/sinhala-gpt-lyrics")
37
- model = AutoModelForCausalLM.from_pretrained("Ransaka/sinhala-gpt-lyrics")
38
- generator = pipeline('text-generation',model=model, tokenizer=tokenizer)
39
- generator("දුර") #දුර ඈත පාසැල් වියේ.
40
  ```
41
  or using git
42
  ```bash
43
  git lfs install
44
- git clone https://huggingface.co/Ransaka/sinhala-gpt-lyrics
45
  ```
46
 
47
  ### Training hyperparameters
@@ -69,4 +73,4 @@ The following hyperparameters were used during training:
69
  - Transformers 4.26.1
70
  - Pytorch 1.13.0
71
  - Datasets 2.1.0
72
- - Tokenizers 0.13.2
 
1
  ---
2
  license: mit
3
  tags:
4
+ - pytorch
5
+ - sinhala
6
+ - gpt2
7
  model-index:
8
  - name: sinhala-gpt2
9
  results: []
10
  widget:
11
+ - text: මහ
12
+ - text: සංවිධ
13
+ - text: දුර්ලභ
14
+ - text: තනිවීලා
15
+ - text: ඔබ
16
  inference:
17
  parameters:
18
  do_sample: false
19
  temperature: 0.3
20
+ language:
21
+ - si
22
  ---
23
 
24
  # sinhala-gpt-lyrics
25
 
26
+ This particular model has undergone fine-tuning based on the [gpt2](https://huggingface.co/gpt2) architecture, utilizing a dataset of Sinhala NEWS from various sources.
27
+ Even though this version of GPT-2 has been finely tuned and is quite simple, it is still capable of generating news articles that are identical. Take, for example, the following samples(Some of them are hilarious though :D):
28
+ - "ඔබ විසින් මෙම විරෝධතාව සංවිධානය කර තිබුණේ නැහැ කියලා හිටපු ජනාධිපති මහ"
29
+ - "දුර්ලභ ගණයේ විශ්වවිද්යාල ප්රතිපාදන කොමිෂන් සභාවේ සභාපති මහාචාර්ය ජී එල්"
30
+ -
31
+ ⚠️ Since the dataset used for this model is mostly composed of news articles, it is heavily biased towards generating news content. This bias may become apparent during the generation process.
32
 
33
  ## Training procedure
34
+ The model was trained for approximately 12+ hours on Kaggle GPUs.
35
 
36
  ## Usage Details
37
 
38
  ```python
39
  from transformers import AutoTokenizer, AutoModelForCausalLM,pipeline
40
 
41
+ tokenizer = AutoTokenizer.from_pretrained("Ransaka/sinhala-gpt2")
42
+ model = AutoModelForCausalLM.from_pretrained("Ransaka/sinhala-gpt2")
43
+ generator("දුර") #දුර ඈත පාසැල් වියේ පසුවූයේ මෙම සිද්ධිය සම්බන්ධයෙන් විමර්ශන සිදුකරන බවයි
 
44
  ```
45
  or using git
46
  ```bash
47
  git lfs install
48
+ git clone https://huggingface.co/Ransaka/sinhala-gpt2
49
  ```
50
 
51
  ### Training hyperparameters
 
73
  - Transformers 4.26.1
74
  - Pytorch 1.13.0
75
  - Datasets 2.1.0
76
+ - Tokenizers 0.13.2