ahmet1338 commited on
Commit
611fa00
1 Parent(s): f1a0ec9

Readme file updated

Browse files
Files changed (1) hide show
  1. README.md +12 -18
README.md CHANGED
@@ -1,10 +1,13 @@
1
  ---
2
- language: "tr"
3
  tags:
4
  - turkish
5
  - tr
6
  - gpt2-tr
7
  - gpt2-turkish
 
 
 
8
  ---
9
  # 🇹🇷 Turkish GPT-2 Model
10
 
@@ -14,24 +17,17 @@ The model is meant to be an entry point for fine-tuning on other texts.
14
 
15
  ## Training corpora
16
 
17
- I used a Turkish corpora that is taken from oscar-corpus.
18
 
19
- It was possible to create byte-level BPE with Tokenizers library of Huggingface.
20
 
21
- With the Tokenizers library, I created a 52K byte-level BPE vocab based on the training corpora.
22
 
23
- After creating the vocab, I could train the GPT-2 for Turkish on two 2080TI over the complete training corpus (five epochs).
24
 
25
  Logs during training:
26
  https://tensorboard.dev/experiment/3AWKv8bBTaqcqZP5frtGkw/#scalars
27
 
28
- ## Model weights
29
 
30
- Both PyTorch and Tensorflow compatible weights are available.
31
-
32
- | Model | Downloads
33
- | --------------------------------- | ---------------------------------------------------------------------------------------------------------------
34
- | `redrussianarmy/gpt2-turkish-cased` | [`config.json`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/resolve/main/config.json) • [`merges.txt`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/resolve/main/merges.txt) • [`pytorch_model.bin`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/resolve/main/pytorch_model.bin) • [`special_tokens_map.json`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/resolve/main/special_tokens_map.json) • [`tf_model.h5`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/resolve/main/tf_model.h5) • [`tokenizer_config.json`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/resolve/main/tokenizer_config.json) • [`traning_args.bin`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/resolve/main/training_args.bin) • [`vocab.json`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/resolve/main/vocab.json)
35
 
36
  ## Using the model
37
 
@@ -39,16 +35,16 @@ The model itself can be used in this way:
39
 
40
  ``` python
41
  from transformers import AutoTokenizer, AutoModelWithLMHead
42
- tokenizer = AutoTokenizer.from_pretrained("redrussianarmy/gpt2-turkish-cased")
43
- model = AutoModelWithLMHead.from_pretrained("redrussianarmy/gpt2-turkish-cased")
44
  ```
45
 
46
  Here's an example that shows how to use the great Transformers Pipelines for generating text:
47
 
48
  ``` python
49
  from transformers import pipeline
50
- pipe = pipeline('text-generation', model="redrussianarmy/gpt2-turkish-cased",
51
- tokenizer="redrussianarmy/gpt2-turkish-cased", config={'max_length':800})
52
  text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
53
  print(text)
54
  ```
@@ -56,8 +52,6 @@ print(text)
56
  ### How to clone the model repo?
57
  ```
58
  git lfs install
59
- git clone https://huggingface.co/redrussianarmy/gpt2-turkish-cased
60
  ```
61
 
62
- ## Contact (Bugs, Feedback, Contribution and more)
63
- For questions about the GPT2-Turkish model, just open an issue [here](https://github.com/redrussianarmy/gpt2-turkish/issues) 🤗
 
1
  ---
2
+ language: tr
3
  tags:
4
  - turkish
5
  - tr
6
  - gpt2-tr
7
  - gpt2-turkish
8
+ license: mit
9
+ metrics:
10
+ - accuracy
11
  ---
12
  # 🇹🇷 Turkish GPT-2 Model
13
 
 
17
 
18
  ## Training corpora
19
 
20
+ I used a Turkish corpus that is taken from different written and oral sources.
21
 
 
22
 
23
+ With the Tokenizers library, I created a 52K BPE vocab based on the training corpus.
24
 
25
+ After creating the vocab, I could train the GPT-2 for Turkish on over the complete training corpus (five epochs).
26
 
27
  Logs during training:
28
  https://tensorboard.dev/experiment/3AWKv8bBTaqcqZP5frtGkw/#scalars
29
 
 
30
 
 
 
 
 
 
31
 
32
  ## Using the model
33
 
 
35
 
36
  ``` python
37
  from transformers import AutoTokenizer, AutoModelWithLMHead
38
+ tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt2-turkish-cased")
39
+ model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt2-turkish-cased")
40
  ```
41
 
42
  Here's an example that shows how to use the great Transformers Pipelines for generating text:
43
 
44
  ``` python
45
  from transformers import pipeline
46
+ pipe = pipeline('text-generation', model="ahmet1338/gpt2-turkish-cased",
47
+ tokenizer="ahmet1338/gpt2-turkish-cased", config={'max_length':800})
48
  text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
49
  print(text)
50
  ```
 
52
  ### How to clone the model repo?
53
  ```
54
  git lfs install
55
+ git clone https://huggingface.co/ahmet1338/gpt2-turkish-cased
56
  ```
57