--- language: tr tags: - turkish - tr - gpt2-tr - gpt2-turkish license: mit metrics: - accuracy --- # Turkish GPT-2 Model (Experimental) I've made available a GPT-2 model for Turkish that I trained on a variety of texts. The model is intended to serve as a starting point for text-specific adjustments. ## Training Source I used a Turkish corpus that is taken from different written and oral sources. I developed a LLM model with 50k vocabulary using the Custom Tokenizers library using the training resources. I could train the GPT-2 for Turkish using the entire training corpus (ten epochs) after developing the vocabulary. ## Using the model The model itself can be used in this way: ``` python from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt-2-experimental") model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt-2-experimental") ``` To generating text, we can use these lines of code which is Transformers Pipelines: ``` python from transformers import pipeline pipe = pipeline('text-generation', model="ahmet1338/gpt-2-experimental", tokenizer="ahmet1338/gpt-2-experimental", config={'max_length':800}) text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"] print(text) ``` ### How to clone the model repo? ``` git lfs install git clone https://huggingface.co/ahmet1338/gpt-2-experimential ```