File size: 1,486 Bytes

f1a0ec9
611fa00
f1a0ec9
 
 
 
 
611fa00
 
 
f1a0ec9
 
 
 
 
 
 
 
 
611fa00
f1a0ec9
 
611fa00
f1a0ec9
611fa00
f1a0ec9
 
 
 
 
 
 
 
 
 
 
 
611fa00
 
f1a0ec9
 
 
 
 
 
611fa00
 
f1a0ec9
 
 
 
 
 
 
611fa00
f1a0ec9

---
language: tr
tags:
- turkish
- tr
- gpt2-tr
- gpt2-turkish
license: mit
metrics:
- accuracy
---
# 🇹🇷 Turkish GPT-2 Model

In this repository I release GPT-2 model, that was trained on various texts for Turkish.

The model is meant to be an entry point for fine-tuning on other texts.

## Training corpora

I used a Turkish corpus that is taken from different written and oral sources.


With the Tokenizers library, I created a 52K BPE vocab based on the training corpus.

After creating the vocab, I could train the GPT-2 for Turkish on over the complete training corpus (five epochs).

Logs during training:
https://tensorboard.dev/experiment/3AWKv8bBTaqcqZP5frtGkw/#scalars



## Using the model

The model itself can be used in this way:

``` python
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt2-turkish-cased")
model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt2-turkish-cased")
```

Here's an example that shows how to use the great Transformers Pipelines for generating text:

``` python
from transformers import pipeline
pipe = pipeline('text-generation', model="ahmet1338/gpt2-turkish-cased",
                 tokenizer="ahmet1338/gpt2-turkish-cased", config={'max_length':800})   
text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
print(text)
```

### How to clone the model repo?
```
git lfs install
git clone https://huggingface.co/ahmet1338/gpt2-turkish-cased
```