ahmet1338
/

gpt-2-experimental

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

ahmet1338 commited on Sep 25, 2023

Commit

e113261

•

1 Parent(s): e9e1c9d

Update README.md

Files changed (1) hide show

README.md +13 -14

README.md CHANGED Viewed

@@ -9,23 +9,21 @@ license: mit
 metrics:
 - accuracy
 ---
-# 🇹🇷 Turkish GPT-2 Model
-In this repository I release GPT-2 model, that was trained on various texts for Turkish.
-The model is meant to be an entry point for fine-tuning on other texts.
-## Training corpora
-I used a Turkish corpus that is taken from different written and oral sources.
-With the Tokenizers library, I created a 52K BPE vocab based on the training corpus.
-After creating the vocab, I could train the GPT-2 for Turkish on over the complete training corpus (five epochs).
-Logs during training:
-https://tensorboard.dev/experiment/3AWKv8bBTaqcqZP5frtGkw/#scalars
@@ -35,16 +33,17 @@ The model itself can be used in this way:
 ``` python
 from transformers import AutoTokenizer, AutoModelWithLMHead
-tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt2-turkish-cased")
-model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt2-turkish-cased")
 ```
-Here's an example that shows how to use the great Transformers Pipelines for generating text:
 ``` python
 from transformers import pipeline
-pipe = pipeline('text-generation', model="ahmet1338/gpt2-turkish-cased",
-                 tokenizer="ahmet1338/gpt2-turkish-cased", config={'max_length':800})
 text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
 print(text)
 ```

 metrics:
 - accuracy
 ---
+# Turkish GPT-2 Model (Experimental)
+I've made available a GPT-2 model for Turkish that I trained on a variety of texts.
+The model is intended to serve as a starting point for text-specific adjustments.
+## Training Source
+I used a Turkish corpus that is taken from different written and oral sources.
+I developed a LLM model with 50k vocabulary using the Custom Tokenizers library using the training resources.
+I could train the GPT-2 for Turkish using the entire training corpus (ten epochs) after developing the vocabulary.
 ``` python
 from transformers import AutoTokenizer, AutoModelWithLMHead
+tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt-2-experimental")
+model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt-2-experimental")
 ```
+To generating text, we can use these lines of code which is Transformers Pipelines:
 ``` python
 from transformers import pipeline
+pipe = pipeline('text-generation', model="ahmet1338/gpt-2-experimental",
+                 tokenizer="ahmet1338/gpt-2-experimental", config={'max_length':800})
 text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
 print(text)
 ```