ahmet1338 commited on
Commit
e113261
1 Parent(s): e9e1c9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -14
README.md CHANGED
@@ -9,23 +9,21 @@ license: mit
9
  metrics:
10
  - accuracy
11
  ---
12
- # 🇹🇷 Turkish GPT-2 Model
13
 
14
- In this repository I release GPT-2 model, that was trained on various texts for Turkish.
15
 
16
- The model is meant to be an entry point for fine-tuning on other texts.
17
 
18
- ## Training corpora
19
 
20
- I used a Turkish corpus that is taken from different written and oral sources.
21
 
 
22
 
23
- With the Tokenizers library, I created a 52K BPE vocab based on the training corpus.
24
 
25
- After creating the vocab, I could train the GPT-2 for Turkish on over the complete training corpus (five epochs).
26
 
27
- Logs during training:
28
- https://tensorboard.dev/experiment/3AWKv8bBTaqcqZP5frtGkw/#scalars
29
 
30
 
31
 
@@ -35,16 +33,17 @@ The model itself can be used in this way:
35
 
36
  ``` python
37
  from transformers import AutoTokenizer, AutoModelWithLMHead
38
- tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt2-turkish-cased")
39
- model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt2-turkish-cased")
40
  ```
41
 
42
- Here's an example that shows how to use the great Transformers Pipelines for generating text:
 
43
 
44
  ``` python
45
  from transformers import pipeline
46
- pipe = pipeline('text-generation', model="ahmet1338/gpt2-turkish-cased",
47
- tokenizer="ahmet1338/gpt2-turkish-cased", config={'max_length':800})
48
  text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
49
  print(text)
50
  ```
 
9
  metrics:
10
  - accuracy
11
  ---
12
+ # Turkish GPT-2 Model (Experimental)
13
 
14
+ I've made available a GPT-2 model for Turkish that I trained on a variety of texts.
15
 
16
+ The model is intended to serve as a starting point for text-specific adjustments.
17
 
 
18
 
19
+ ## Training Source
20
 
21
+ I used a Turkish corpus that is taken from different written and oral sources.
22
 
 
23
 
24
+ I developed a LLM model with 50k vocabulary using the Custom Tokenizers library using the training resources.
25
 
26
+ I could train the GPT-2 for Turkish using the entire training corpus (ten epochs) after developing the vocabulary.
 
27
 
28
 
29
 
 
33
 
34
  ``` python
35
  from transformers import AutoTokenizer, AutoModelWithLMHead
36
+ tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt-2-experimental")
37
+ model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt-2-experimental")
38
  ```
39
 
40
+
41
+ To generating text, we can use these lines of code which is Transformers Pipelines:
42
 
43
  ``` python
44
  from transformers import pipeline
45
+ pipe = pipeline('text-generation', model="ahmet1338/gpt-2-experimental",
46
+ tokenizer="ahmet1338/gpt-2-experimental", config={'max_length':800})
47
  text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
48
  print(text)
49
  ```