ahmet1338
/

gpt-2-experimental

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

gpt-2-experimental / README.md

ahmet1338's picture

Readme file updated

611fa00 9 months ago

|

raw history blame

No virus

1.49 kB

	---
	language: tr
	tags:
	- turkish
	- tr
	- gpt2-tr
	- gpt2-turkish
	license: mit
	metrics:
	- accuracy
	---
	# 🇹🇷 Turkish GPT-2 Model

	In this repository I release GPT-2 model, that was trained on various texts for Turkish.

	The model is meant to be an entry point for fine-tuning on other texts.

	## Training corpora

	I used a Turkish corpus that is taken from different written and oral sources.


	With the Tokenizers library, I created a 52K BPE vocab based on the training corpus.

	After creating the vocab, I could train the GPT-2 for Turkish on over the complete training corpus (five epochs).

	Logs during training:
	https://tensorboard.dev/experiment/3AWKv8bBTaqcqZP5frtGkw/#scalars



	## Using the model

	The model itself can be used in this way:

	``` python
	from transformers import AutoTokenizer, AutoModelWithLMHead
	tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt2-turkish-cased")
	model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt2-turkish-cased")
	```

	Here's an example that shows how to use the great Transformers Pipelines for generating text:

	``` python
	from transformers import pipeline
	pipe = pipeline('text-generation', model="ahmet1338/gpt2-turkish-cased",
	tokenizer="ahmet1338/gpt2-turkish-cased", config={'max_length':800})
	text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
	print(text)
	```

	### How to clone the model repo?
	```
	git lfs install
	git clone https://huggingface.co/ahmet1338/gpt2-turkish-cased
	```