File size: 1,426 Bytes
f1a0ec9
611fa00
f1a0ec9
 
 
 
 
611fa00
 
 
f1a0ec9
e113261
f1a0ec9
e113261
f1a0ec9
e113261
f1a0ec9
 
e113261
f1a0ec9
e113261
f1a0ec9
 
e113261
f1a0ec9
e113261
f1a0ec9
 
 
 
 
 
 
 
 
e113261
 
f1a0ec9
 
e113261
 
f1a0ec9
 
 
e113261
 
f1a0ec9
 
 
 
 
 
 
7e23677
f1a0ec9
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
language: tr
tags:
- turkish
- tr
- gpt2-tr
- gpt2-turkish
license: mit
metrics:
- accuracy
---
# Turkish GPT-2 Model (Experimental)

I've made available a GPT-2 model for Turkish that I trained on a variety of texts. 

The model is intended to serve as a starting point for text-specific adjustments.


## Training Source

I used a Turkish corpus that is taken from different written and oral sources.


I developed a LLM model with 50k vocabulary using the Custom Tokenizers library using the training resources.

I could train the GPT-2 for Turkish using the entire training corpus (ten epochs) after developing the vocabulary.



## Using the model

The model itself can be used in this way:

``` python
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt-2-experimental")
model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt-2-experimental")
```


To generating text, we can use these lines of code which is Transformers Pipelines:

``` python
from transformers import pipeline
pipe = pipeline('text-generation', model="ahmet1338/gpt-2-experimental",
                 tokenizer="ahmet1338/gpt-2-experimental", config={'max_length':800})   
text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
print(text)
```

### How to clone the model repo?
```
git lfs install
git clone https://huggingface.co/ahmet1338/gpt-2-experimential
```