--- language: tr tags: - turkish - tr - gpt2-tr - gpt2-turkish license: mit metrics: - accuracy --- # 🇹🇷 Turkish GPT-2 Model In this repository I release GPT-2 model, that was trained on various texts for Turkish. The model is meant to be an entry point for fine-tuning on other texts. ## Training corpora I used a Turkish corpus that is taken from different written and oral sources. With the Tokenizers library, I created a 52K BPE vocab based on the training corpus. After creating the vocab, I could train the GPT-2 for Turkish on over the complete training corpus (five epochs). Logs during training: https://tensorboard.dev/experiment/3AWKv8bBTaqcqZP5frtGkw/#scalars ## Using the model The model itself can be used in this way: ``` python from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt2-turkish-cased") model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt2-turkish-cased") ``` Here's an example that shows how to use the great Transformers Pipelines for generating text: ``` python from transformers import pipeline pipe = pipeline('text-generation', model="ahmet1338/gpt2-turkish-cased", tokenizer="ahmet1338/gpt2-turkish-cased", config={'max_length':800}) text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"] print(text) ``` ### How to clone the model repo? ``` git lfs install git clone https://huggingface.co/ahmet1338/gpt2-turkish-cased ```