Upload 6 files
Browse files- README.md +111 -3
- chatbot.py +141 -0
- dataset.py +101 -0
- model.py +265 -0
- requirements.txt +7 -0
- training.py +150 -0
README.md
CHANGED
@@ -1,3 +1,111 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Transformer Modeli ile Türkçe Chatbot
|
2 |
+
|
3 |
+
Bu depo, Transformer modeli kullanılarak oluşturulmuş bir Türkçe sohbet botunun kodunu içerir. Sohbet robotu, Türkçe konuşmalardan oluşan bir veri kümesi üzerinde eğitilmiştir ve kullanıcı girdilerine yanıtlar üretebilir.
|
4 |
+
|
5 |
+
## Dosyalar
|
6 |
+
|
7 |
+
* **model.py:** Kodlayıcı, kod çözücü ve dikkat mekanizmaları dahil olmak üzere Transformer model mimarisini tanımlar.
|
8 |
+
* **dataset.py:** Konuşma veri kümesinin yüklenmesi, ön işlenmesi ve tokenize edilmesi için fonksiyonlar içerir.
|
9 |
+
* **chatbot.py:** Yanıt oluşturma ve kullanıcı geri bildirimi toplama dahil olmak üzere etkileşimli sohbet işlevselliğini yönetir.
|
10 |
+
* **data/lines.txt:** Eğitim için kullanılan ham metin satırlarını depolar.
|
11 |
+
* **data/conversations.txt:** Eğitim için kullanılan konuşma çiftlerini içerir.
|
12 |
+
|
13 |
+
## Kullanım
|
14 |
+
|
15 |
+
1. **Depoyu klonlayın:**
|
16 |
+
```shell
|
17 |
+
git clone https://github.com/EmirhanOzl/transformer-turkish-chatbot.git
|
18 |
+
```
|
19 |
+
2. **Gerekli bağımlılıkları yükleyin:**
|
20 |
+
```shell
|
21 |
+
pip install -r requirements.txt
|
22 |
+
```
|
23 |
+
3. **Modeli eğitin:**
|
24 |
+
```shell
|
25 |
+
python training.py
|
26 |
+
```
|
27 |
+
4. **Chatbotu çalıştırın:**
|
28 |
+
```shell
|
29 |
+
python chatbot.py
|
30 |
+
```
|
31 |
+
|
32 |
+
## Eğitim Verileri
|
33 |
+
|
34 |
+
Chatbot, `data/` dizininde depolanan Türkçe konuşmalardan oluşan bir veri kümesi üzerinde eğitilmiştir. Sağlanan veri setini kullanabilir veya kendi veri setinizi oluşturabilirsiniz. Veri kümesi aşağıdaki formatta olmalıdır:
|
35 |
+
|
36 |
+
Veri kümesi iki dosyadan oluşmaktadır:
|
37 |
+
|
38 |
+
* **lines.txt:** Bu dosya eğitim için kullanılan ham metin satırlarını içerir. Her satır aşağıdaki formattadır:
|
39 |
+
```
|
40 |
+
[LINE_ID] +++$+++ [USER_ID] +++$+++ [MOVIE_ID] +++$+++ [CHARACTER_NAME] +++$+++ [TEXT]
|
41 |
+
```
|
42 |
+
|
43 |
+
* `LINE_ID` hattın kimliğidir.
|
44 |
+
* `USER_ID` hattı konuşan kullanıcının kimliğidir.
|
45 |
+
* `MOVIE_ID` konuşmanın hakkında olduğu filmin kimliğidir (isteğe bağlı).
|
46 |
+
* `CHARACTER_NAME` satırı söyleyen karakterin adıdır (isteğe bağlı).
|
47 |
+
* `TEXT` satırın metnidir.
|
48 |
+
|
49 |
+
* **conversations.txt:** Bu dosya eğitim için kullanılan konuşma çiftlerini içerir. Her satır aşağıdaki formattadır:
|
50 |
+
```
|
51 |
+
[USER_ID] +++$+++ [BOT_ID] +++$+++ [MOVIE_ID] +++$+++ [CONVERSATION]
|
52 |
+
```
|
53 |
+
|
54 |
+
* `USER_ID` kullanıcının kimliğidir.
|
55 |
+
* `BOT_ID` sohbet robotunun kimliğidir.
|
56 |
+
* `MOVIE_ID` konuşmanın hakkında olduğu filmin kimliğidir (isteğe bağlı).
|
57 |
+
* `CONVERSATION`, virgülle ayrılmış, konuşma dönüşlerini temsil eden satır kimliklerinin bir listesidir.
|
58 |
+
|
59 |
+
Sağlanan veri kümesini kullanabilir veya aynı formatta kendi veri kümenizi oluşturabilirsiniz.
|
60 |
+
|
61 |
+
**Not:** `MOVIE_ID` ve `CHARACTER_NAME` alanları isteğe bağlıdır ve yerine default birşey yazılabilir.
|
62 |
+
|
63 |
+
|
64 |
+
|
65 |
+
## Örnek çıktı
|
66 |
+
|
67 |
+
Aşağıda listelenen hiperparametreler kullanılarak aşağıdaki sonuçlar elde edilmiştir:
|
68 |
+
|
69 |
+
```python
|
70 |
+
self.save_model = "model.h5"
|
71 |
+
self.max_samples = 25000
|
72 |
+
self.max_length = 60
|
73 |
+
self.batch_size = 128
|
74 |
+
self.num_layers = 3
|
75 |
+
self.num_units = 512
|
76 |
+
self.d_model = 256
|
77 |
+
self.num_heads = 8
|
78 |
+
self.dropout = 0.1
|
79 |
+
self.activation = "relu"
|
80 |
+
self.epochs = 100
|
81 |
+
```
|
82 |
+
|
83 |
+
Training Results:
|
84 |
+
```
|
85 |
+
Epoch 100/100
|
86 |
+
24/24 [==============================] - 192s 8s/step - loss: 0.0145 - accuracy: 0.3174
|
87 |
+
```
|
88 |
+
|
89 |
+
Örnek çıktı:
|
90 |
+
|
91 |
+
**User:** Merhaba nasılsın?
|
92 |
+
**Bot:** Merhaba! Ben iyiyim, teşekkür ederim. Sana nasıl yardımcı olabilirim bugün?
|
93 |
+
|
94 |
+
**User:** Sence de gökyüzü çok güzel değil mi?
|
95 |
+
**Bot:** O zaman hemen başlayalım, bu filmi izlemek için daha fazla beklemek istemiyorum.
|
96 |
+
|
97 |
+
**User:** Sanırım uzaklara gideceğim.
|
98 |
+
**Bot:** Nereye gitmek istersin?
|
99 |
+
|
100 |
+
**User:** Nereye gitmek istersin?
|
101 |
+
**Bot:** Belki de dünya dışına. Sence dünya dışını görebilir miyiz?
|
102 |
+
|
103 |
+
**User:** Belki de dünya dışına. Sence dünya dışını görebilir miyiz?
|
104 |
+
**Bot:** Belki bir gün teknoloji bize bu imkanı sunar, kim bilir?
|
105 |
+
|
106 |
+
**User:** Belki bir gün teknoloji bize bu imkanı sunar, kim bilir?
|
107 |
+
**Bot:** Ne zaman istersen yardımcı olmaktan mutluluk duyarım. Yaratıcı bir süreç geçirmeni dilerim!
|
108 |
+
|
109 |
+
|
110 |
+
Not: Sohbet robotunun yanıtları, kullanılan eğitim verilerine ve hiperparametrelerine bağlı olarak değişebilir.
|
111 |
+
|
chatbot.py
ADDED
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import argparse
|
2 |
+
import tensorflow as tf
|
3 |
+
import model
|
4 |
+
from dataset import get_dataset, preprocess_sentence
|
5 |
+
|
6 |
+
|
7 |
+
def inference(hparams, chatbot, tokenizer, sentence):
|
8 |
+
sentence = preprocess_sentence(sentence)
|
9 |
+
|
10 |
+
sentence = tf.expand_dims(
|
11 |
+
hparams.start_token + tokenizer.encode(sentence) + hparams.end_token, axis=0
|
12 |
+
)
|
13 |
+
|
14 |
+
output = tf.expand_dims(hparams.start_token, 0)
|
15 |
+
|
16 |
+
for _ in range(hparams.max_length):
|
17 |
+
predictions = chatbot(inputs=[sentence, output], training=False)
|
18 |
+
|
19 |
+
predictions = predictions[:, -1:, :]
|
20 |
+
predicted_id = tf.cast(tf.argmax(predictions, axis=-1), tf.int32)
|
21 |
+
|
22 |
+
if tf.equal(predicted_id, hparams.end_token[0]):
|
23 |
+
break
|
24 |
+
|
25 |
+
output = tf.concat([output, predicted_id], axis=-1)
|
26 |
+
|
27 |
+
return tf.squeeze(output, axis=0)
|
28 |
+
|
29 |
+
|
30 |
+
def predict(hparams, chatbot, tokenizer, sentence):
|
31 |
+
prediction = inference(hparams, chatbot, tokenizer, sentence)
|
32 |
+
predicted_sentence = tokenizer.decode(
|
33 |
+
[i for i in prediction if i < tokenizer.vocab_size]
|
34 |
+
)
|
35 |
+
return predicted_sentence
|
36 |
+
|
37 |
+
def read_file(file_path):
|
38 |
+
with open(file_path, 'r', encoding='utf-8') as file:
|
39 |
+
lines = file.readlines()
|
40 |
+
return lines
|
41 |
+
|
42 |
+
def append_to_file(file_path, line):
|
43 |
+
with open(file_path, 'a', encoding='utf-8') as file:
|
44 |
+
file.write(f"{line}\n")
|
45 |
+
|
46 |
+
def get_last_ids(lines_file, conversations_file):
|
47 |
+
lines = read_file(lines_file)
|
48 |
+
conversations = read_file(conversations_file)
|
49 |
+
|
50 |
+
last_line = lines[-1]
|
51 |
+
last_conversation = conversations[-1]
|
52 |
+
|
53 |
+
last_line_id = int(last_line.split(" +++$+++ ")[0][1:])
|
54 |
+
last_user_id = int(last_conversation.split(" +++$+++ ")[1][1:])
|
55 |
+
last_movie_id = int(last_conversation.split(" +++$+++ ")[2][1:])
|
56 |
+
|
57 |
+
return last_line_id, last_user_id, last_movie_id
|
58 |
+
|
59 |
+
def update_data_files(user_input, bot_response, lines_file='data/lines.txt', conversations_file='data/conversations.txt'):
|
60 |
+
last_line_id, last_user_id, last_movie_id = get_last_ids(lines_file, conversations_file)
|
61 |
+
|
62 |
+
new_line_id = f"L{last_line_id + 1}"
|
63 |
+
new_bot_line_id = f"L{last_line_id + 2}"
|
64 |
+
new_user_id = f"u{last_user_id + 1}"
|
65 |
+
new_bot_user_id = f"u{last_user_id + 2}"
|
66 |
+
new_movie_id = f"m{last_movie_id + 1}"
|
67 |
+
|
68 |
+
append_to_file(lines_file, f"{new_line_id} +++$+++ {new_user_id} +++$+++ {new_movie_id} +++$+++ Ben +++$+++ {user_input}")
|
69 |
+
append_to_file(lines_file, f"{new_bot_line_id} +++$+++ {new_bot_user_id} +++$+++ {new_movie_id} +++$+++ Bot +++$+++ {bot_response}")
|
70 |
+
|
71 |
+
new_conversation = f"{new_user_id} +++$+++ {new_bot_user_id} +++$+++ {new_movie_id} +++$+++ ['{new_line_id}', '{new_bot_line_id}']"
|
72 |
+
append_to_file(conversations_file, new_conversation)
|
73 |
+
|
74 |
+
def get_feedback():
|
75 |
+
feedback = input("Bu cevap yardımcı oldu mu? (Evet/Hayır): ").lower()
|
76 |
+
return feedback == "Evet"
|
77 |
+
|
78 |
+
def chat(hparams, chatbot, tokenizer):
|
79 |
+
print("\nCHATBOT")
|
80 |
+
|
81 |
+
for _ in range(5):
|
82 |
+
sentence = input("Sen: ")
|
83 |
+
output = predict(hparams, chatbot, tokenizer, sentence)
|
84 |
+
print(f"\nBOT: {output}")
|
85 |
+
|
86 |
+
|
87 |
+
user_input = sentence
|
88 |
+
bot_response = output
|
89 |
+
|
90 |
+
feedback = get_feedback()
|
91 |
+
|
92 |
+
if feedback:
|
93 |
+
update_data_files(user_input, bot_response)
|
94 |
+
else:
|
95 |
+
pass
|
96 |
+
|
97 |
+
|
98 |
+
def main(hparams):
|
99 |
+
|
100 |
+
_, token = get_dataset(hparams)
|
101 |
+
|
102 |
+
tf.keras.backend.clear_session()
|
103 |
+
chatbot = tf.keras.models.load_model(
|
104 |
+
hparams.save_model,
|
105 |
+
custom_objects={
|
106 |
+
"PositionalEncoding": model.PositionalEncoding,
|
107 |
+
"MultiHeadAttention": model.MultiHeadAttention,
|
108 |
+
},
|
109 |
+
compile=False,
|
110 |
+
)
|
111 |
+
|
112 |
+
|
113 |
+
chat(hparams, chatbot, token)
|
114 |
+
|
115 |
+
|
116 |
+
if __name__ == "__main__":
|
117 |
+
|
118 |
+
parser = argparse.ArgumentParser()
|
119 |
+
parser.add_argument(
|
120 |
+
"--save_model", default="model.h5", type=str, help="path save the model"
|
121 |
+
)
|
122 |
+
parser.add_argument(
|
123 |
+
"--max_samples",
|
124 |
+
default=25000,
|
125 |
+
type=int,
|
126 |
+
help="maximum number of conversation pairs to use",
|
127 |
+
)
|
128 |
+
parser.add_argument(
|
129 |
+
"--max_length", default=40, type=int, help="maximum sentence length"
|
130 |
+
)
|
131 |
+
parser.add_argument("--batch_size", default=64, type=int)
|
132 |
+
parser.add_argument("--num_layers", default=2, type=int)
|
133 |
+
parser.add_argument("--num_units", default=512, type=int)
|
134 |
+
parser.add_argument("--d_model", default=256, type=int)
|
135 |
+
parser.add_argument("--num_heads", default=8, type=int)
|
136 |
+
parser.add_argument("--dropout", default=0.1, type=float)
|
137 |
+
parser.add_argument("--activation", default="relu", type=str)
|
138 |
+
parser.add_argument("--epochs", default=80, type=int)
|
139 |
+
|
140 |
+
main(parser.parse_args())
|
141 |
+
|
dataset.py
ADDED
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import re
|
2 |
+
import tensorflow as tf
|
3 |
+
import tensorflow_datasets as tfds
|
4 |
+
import nltk
|
5 |
+
from nltk.stem import WordNetLemmatizer
|
6 |
+
|
7 |
+
nltk.download('wordnet')
|
8 |
+
nltk.download('punkt')
|
9 |
+
|
10 |
+
lemmatizer = WordNetLemmatizer()
|
11 |
+
|
12 |
+
|
13 |
+
def preprocess_sentence(sentence):
|
14 |
+
sentence = sentence.lower().strip()
|
15 |
+
|
16 |
+
sentence = re.sub(r"([?.!¿])", r" \1 ", sentence)
|
17 |
+
sentence = re.sub(r'[" "]+', " ", sentence)
|
18 |
+
sentence = re.sub(r"[-()\"#/@;:<>{}+=~|.?,]", "", sentence)
|
19 |
+
|
20 |
+
sentence = re.sub(r"[^a-zA-ZğüşöçıİĞÜŞÖÇ?.!,¿]+", " ", sentence)
|
21 |
+
sentence = sentence.strip()
|
22 |
+
|
23 |
+
sentence = ' '.join([lemmatizer.lemmatize(w) for w in nltk.word_tokenize(sentence)])
|
24 |
+
|
25 |
+
return sentence
|
26 |
+
|
27 |
+
def load_conversations(hparams, lines_file, conversations_file):
|
28 |
+
id2line = {}
|
29 |
+
|
30 |
+
with open(lines_file, encoding = "utf-8", errors="ignore") as file:
|
31 |
+
lines = file.readlines()
|
32 |
+
|
33 |
+
for line in lines:
|
34 |
+
parts = line.replace("\n", "").split(" +++$+++ ")
|
35 |
+
id2line[parts[0]] = parts[4]
|
36 |
+
|
37 |
+
questions = []
|
38 |
+
answers = []
|
39 |
+
|
40 |
+
with open(conversations_file, "r") as file:
|
41 |
+
lines = file.readlines()
|
42 |
+
for line in lines:
|
43 |
+
parts = line.replace("\n", "").split(" +++$+++ ")
|
44 |
+
conversation = [line[1:-1] for line in parts[3][1:-1].split(", ")]
|
45 |
+
for i in range(len(conversation) - 1):
|
46 |
+
questions.append(preprocess_sentence(id2line[conversation[i]]))
|
47 |
+
answers.append(preprocess_sentence(id2line[conversation[i + 1]]))
|
48 |
+
if len(questions) >= hparams.max_samples:
|
49 |
+
return questions, answers
|
50 |
+
|
51 |
+
return questions, answers
|
52 |
+
|
53 |
+
|
54 |
+
def tokenize(hparams, tokenizer, questions, answers):
|
55 |
+
tokenized_inputs, tokenized_outputs = [], []
|
56 |
+
|
57 |
+
for (question, answer) in zip(questions, answers):
|
58 |
+
sentence1 = hparams.start_token + tokenizer.encode(question) + hparams.end_token
|
59 |
+
sentence2 = hparams.start_token + tokenizer.encode(answer) + hparams.end_token
|
60 |
+
|
61 |
+
if (len(sentence1) <= hparams.max_length and len(sentence2) <= hparams.max_length):
|
62 |
+
tokenized_inputs.append(sentence1)
|
63 |
+
tokenized_outputs.append(sentence2)
|
64 |
+
|
65 |
+
tokenized_inputs = tf.keras.preprocessing.sequence.pad_sequences(
|
66 |
+
tokenized_inputs, maxlen=hparams.max_length, padding="post")
|
67 |
+
tokenized_outputs = tf.keras.preprocessing.sequence.pad_sequences(
|
68 |
+
tokenized_outputs, maxlen=hparams.max_length, padding="post")
|
69 |
+
|
70 |
+
return tokenized_inputs, tokenized_outputs
|
71 |
+
|
72 |
+
|
73 |
+
def get_dataset(hparams):
|
74 |
+
lines_file ="data/lines.txt"
|
75 |
+
conversations_file = "data/conversations.txt"
|
76 |
+
|
77 |
+
questions, answers = load_conversations(hparams, lines_file, conversations_file)
|
78 |
+
|
79 |
+
tokenizer = tfds.deprecated.text.SubwordTextEncoder.build_from_corpus(questions + answers, target_vocab_size=2**13)
|
80 |
+
|
81 |
+
tokenizer.save_to_file('tokenizer')
|
82 |
+
|
83 |
+
hparams.start_token = [tokenizer.vocab_size]
|
84 |
+
hparams.end_token = [tokenizer.vocab_size + 1]
|
85 |
+
hparams.vocab_size = tokenizer.vocab_size + 2
|
86 |
+
|
87 |
+
questions, answers = tokenize(hparams, tokenizer, questions, answers)
|
88 |
+
|
89 |
+
dataset = tf.data.Dataset.from_tensor_slices(
|
90 |
+
({"inputs": questions, "dec_inputs": answers[:, :-1]}, answers[:, 1:])
|
91 |
+
)
|
92 |
+
|
93 |
+
dataset = dataset.cache()
|
94 |
+
dataset = dataset.shuffle(len(questions))
|
95 |
+
dataset = dataset.batch(hparams.batch_size)
|
96 |
+
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
|
97 |
+
|
98 |
+
return dataset, tokenizer
|
99 |
+
|
100 |
+
|
101 |
+
|
model.py
ADDED
@@ -0,0 +1,265 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import tensorflow as tf
|
2 |
+
|
3 |
+
def sdp_attention(query, key, value, mask):
|
4 |
+
matmul_qk = tf.matmul(query, key, transpose_b=True)
|
5 |
+
depth = tf.cast(tf.shape(key)[-1], tf.float32)
|
6 |
+
logits = matmul_qk / tf.math.sqrt(depth)
|
7 |
+
if mask is not None:
|
8 |
+
logits += mask * -1e9
|
9 |
+
attention_weights = tf.nn.softmax(logits, axis=-1)
|
10 |
+
output = tf.matmul(attention_weights, value)
|
11 |
+
return output
|
12 |
+
|
13 |
+
class MultiHeadAttention(tf.keras.layers.Layer):
|
14 |
+
def __init__(self, num_heads, d_model, **kwargs):
|
15 |
+
assert d_model % num_heads == 0
|
16 |
+
super(MultiHeadAttention, self).__init__(**kwargs)
|
17 |
+
self.num_heads = num_heads
|
18 |
+
self.d_model = d_model
|
19 |
+
|
20 |
+
self.depth = self.d_model // self.num_heads
|
21 |
+
|
22 |
+
self.query_dense = tf.keras.layers.Dense(self.d_model)
|
23 |
+
self.key_dense = tf.keras.layers.Dense(self.d_model)
|
24 |
+
self.value_dense = tf.keras.layers.Dense(self.d_model)
|
25 |
+
|
26 |
+
self.dense = tf.keras.layers.Dense(self.d_model)
|
27 |
+
|
28 |
+
|
29 |
+
def get_config(self):
|
30 |
+
config = super(MultiHeadAttention, self).get_config()
|
31 |
+
config.update({"num_heads": self.num_heads, "d_model": self.d_model})
|
32 |
+
return config
|
33 |
+
|
34 |
+
def split_heads(self, inputs: tf.Tensor, batch_size: int):
|
35 |
+
inputs = tf.keras.layers.Lambda(
|
36 |
+
lambda inputs: tf.reshape(
|
37 |
+
inputs, shape=(batch_size, -1, self.num_heads, self.depth))
|
38 |
+
)(inputs)
|
39 |
+
return tf.keras.layers.Lambda(
|
40 |
+
lambda inputs: tf.transpose(inputs, perm=[0, 2, 1, 3])
|
41 |
+
)(inputs)
|
42 |
+
|
43 |
+
def call(self, inputs: tf.Tensor):
|
44 |
+
query, key, value, mask = (
|
45 |
+
inputs["query"],
|
46 |
+
inputs["key"],
|
47 |
+
inputs["value"],
|
48 |
+
inputs["mask"],
|
49 |
+
)
|
50 |
+
batch_size = tf.shape(query)[0]
|
51 |
+
|
52 |
+
query = self.query_dense(query)
|
53 |
+
key = self.key_dense(key)
|
54 |
+
value = self.value_dense(value)
|
55 |
+
|
56 |
+
query = self.split_heads(query, batch_size)
|
57 |
+
key = self.split_heads(key, batch_size)
|
58 |
+
value = self.split_heads(value, batch_size)
|
59 |
+
|
60 |
+
scaled_attention = sdp_attention(query, key, value, mask)
|
61 |
+
scaled_attention = tf.keras.layers.Lambda(
|
62 |
+
lambda scaled_attention: tf.transpose(scaled_attention, perm=[0, 2, 1, 3])
|
63 |
+
)(scaled_attention)
|
64 |
+
|
65 |
+
concat_attention = tf.keras.layers.Lambda(
|
66 |
+
lambda scaled_attention: tf.reshape(
|
67 |
+
scaled_attention, (batch_size, -1, self.d_model)
|
68 |
+
)
|
69 |
+
)(scaled_attention)
|
70 |
+
|
71 |
+
outputs = self.dense(concat_attention)
|
72 |
+
|
73 |
+
return outputs
|
74 |
+
|
75 |
+
|
76 |
+
def create_padding_mask(x):
|
77 |
+
mask = tf.cast(tf.math.equal(x, 0), dtype=tf.float32)
|
78 |
+
return mask[:, tf.newaxis, tf.newaxis, :]
|
79 |
+
|
80 |
+
|
81 |
+
def create_look_ahead_mask(x):
|
82 |
+
seq_len = tf.shape(x)[1]
|
83 |
+
look_ahead_mask = 1 - tf.linalg.band_part(
|
84 |
+
tf.ones((seq_len, seq_len), dtype=tf.float32), -1, 0
|
85 |
+
)
|
86 |
+
padding_mask = create_padding_mask(x)
|
87 |
+
return tf.maximum(look_ahead_mask, padding_mask)
|
88 |
+
|
89 |
+
|
90 |
+
class PositionalEncoding(tf.keras.layers.Layer):
|
91 |
+
def __init__(self, position: int, d_model: int, **kwargs):
|
92 |
+
super(PositionalEncoding, self).__init__(**kwargs)
|
93 |
+
self.position = position
|
94 |
+
self.d_model = d_model
|
95 |
+
self.pos_encoding = self.positional_encoding(position, d_model)
|
96 |
+
|
97 |
+
def get_config(self):
|
98 |
+
config = super(PositionalEncoding, self).get_config()
|
99 |
+
config.update({"position": self.position, "d_model": self.d_model})
|
100 |
+
return config
|
101 |
+
|
102 |
+
def get_angles(self, position: tf.Tensor, i: tf.Tensor, d_model: tf.Tensor):
|
103 |
+
angles = 1 / tf.pow(10000, (2 * (i // 2)) / d_model)
|
104 |
+
return position * angles
|
105 |
+
|
106 |
+
def positional_encoding(self, position: int, d_model: int):
|
107 |
+
angle_rads = self.get_angles(
|
108 |
+
position=tf.cast(tf.range(position)[:, tf.newaxis], dtype=tf.float32),
|
109 |
+
i=tf.cast(tf.range(d_model)[tf.newaxis, :], dtype=tf.float32),
|
110 |
+
d_model=tf.cast(d_model, dtype=tf.float32),
|
111 |
+
)
|
112 |
+
sines = tf.math.sin(angle_rads[:, 0::2])
|
113 |
+
cosines = tf.math.cos(angle_rads[:, 1::2])
|
114 |
+
|
115 |
+
pos_encoding = tf.concat([sines, cosines], axis=-1)
|
116 |
+
pos_encoding = pos_encoding[tf.newaxis, ...]
|
117 |
+
return pos_encoding
|
118 |
+
|
119 |
+
def call(self, inputs: tf.Tensor):
|
120 |
+
return inputs + self.pos_encoding[:, : tf.shape(inputs)[1], :]
|
121 |
+
|
122 |
+
def encoder_layer(hparams, name: str = "encoder_layer"):
|
123 |
+
inputs = tf.keras.Input(shape=(None, hparams.d_model), name="inputs")
|
124 |
+
padding_mask = tf.keras.Input(shape=(1, 1, None), name="padding_mask")
|
125 |
+
|
126 |
+
attention = MultiHeadAttention(
|
127 |
+
num_heads=hparams.num_heads, d_model=hparams.d_model, name="attention"
|
128 |
+
)({"query": inputs, "key": inputs, "value": inputs, "mask": padding_mask})
|
129 |
+
attention = tf.keras.layers.Dropout(hparams.dropout)(attention)
|
130 |
+
attention += tf.cast(inputs, dtype=tf.float32)
|
131 |
+
attention = tf.keras.layers.LayerNormalization(epsilon=1e-6)(attention)
|
132 |
+
|
133 |
+
outputs = tf.keras.layers.Dense(hparams.num_units, activation=hparams.activation)(
|
134 |
+
attention
|
135 |
+
)
|
136 |
+
outputs = tf.keras.layers.Dense(hparams.d_model)(outputs)
|
137 |
+
outputs = tf.keras.layers.Dropout(hparams.dropout)(outputs)
|
138 |
+
outputs += attention
|
139 |
+
outputs = tf.keras.layers.LayerNormalization(epsilon=1e-6)(outputs)
|
140 |
+
|
141 |
+
return tf.keras.Model(inputs=[inputs, padding_mask], outputs=outputs, name=name)
|
142 |
+
|
143 |
+
def encoder(hparams, name: str = "encoder"):
|
144 |
+
inputs = tf.keras.Input(shape=(None,), name="inputs")
|
145 |
+
padding_mask = tf.keras.Input(shape=(1, 1, None), name="padding_mask")
|
146 |
+
|
147 |
+
embeddings = tf.keras.layers.Embedding(hparams.vocab_size, hparams.d_model)(inputs)
|
148 |
+
embeddings *= tf.math.sqrt(tf.cast(hparams.d_model, dtype=tf.float32))
|
149 |
+
embeddings = PositionalEncoding(
|
150 |
+
position=hparams.vocab_size, d_model=hparams.d_model
|
151 |
+
)(embeddings)
|
152 |
+
|
153 |
+
outputs = tf.keras.layers.Dropout(hparams.dropout)(embeddings)
|
154 |
+
|
155 |
+
for i in range(hparams.num_layers):
|
156 |
+
outputs = encoder_layer(hparams, name=f"encoder_layer_{i}")(
|
157 |
+
[outputs, padding_mask]
|
158 |
+
)
|
159 |
+
|
160 |
+
return tf.keras.Model(inputs=[inputs, padding_mask], outputs=outputs, name=name)
|
161 |
+
|
162 |
+
def decoder_layer(hparams, name: str = "decoder_layer"):
|
163 |
+
inputs = tf.keras.Input(shape=(None, hparams.d_model), name="inputs")
|
164 |
+
enc_outputs = tf.keras.Input(shape=(None, hparams.d_model), name="encoder_outputs")
|
165 |
+
look_ahead_mask = tf.keras.Input(shape=(1, None, None), name="look_ahead_mask")
|
166 |
+
padding_mask = tf.keras.Input(shape=(1, 1, None), name="padding_mask")
|
167 |
+
|
168 |
+
attention1 = MultiHeadAttention(
|
169 |
+
num_heads=hparams.num_heads, d_model=hparams.d_model, name="attention_1"
|
170 |
+
)(
|
171 |
+
inputs={
|
172 |
+
"query": inputs,
|
173 |
+
"key": inputs,
|
174 |
+
"value": inputs,
|
175 |
+
"mask": look_ahead_mask,
|
176 |
+
}
|
177 |
+
)
|
178 |
+
attention1 += tf.cast(inputs, dtype=tf.float32)
|
179 |
+
attention1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)(attention1)
|
180 |
+
|
181 |
+
attention2 = MultiHeadAttention(
|
182 |
+
num_heads=hparams.num_heads, d_model=hparams.d_model, name="attention_2"
|
183 |
+
)(
|
184 |
+
inputs={
|
185 |
+
"query": attention1,
|
186 |
+
"key": enc_outputs,
|
187 |
+
"value": enc_outputs,
|
188 |
+
"mask": padding_mask,
|
189 |
+
}
|
190 |
+
)
|
191 |
+
attention2 = tf.keras.layers.Dropout(hparams.dropout)(attention2)
|
192 |
+
attention2 += attention1
|
193 |
+
attention2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)(
|
194 |
+
attention2 + attention1
|
195 |
+
)
|
196 |
+
|
197 |
+
outputs = tf.keras.layers.Dense(hparams.num_units, activation=hparams.activation)(
|
198 |
+
attention2
|
199 |
+
)
|
200 |
+
outputs = tf.keras.layers.Dense(hparams.d_model)(outputs)
|
201 |
+
outputs = tf.keras.layers.Dropout(hparams.dropout)(outputs)
|
202 |
+
outputs += attention2
|
203 |
+
outputs = tf.keras.layers.LayerNormalization(epsilon=1e-6)(outputs)
|
204 |
+
|
205 |
+
return tf.keras.Model(
|
206 |
+
inputs=[inputs, enc_outputs, look_ahead_mask, padding_mask],
|
207 |
+
outputs=outputs,
|
208 |
+
name=name,
|
209 |
+
)
|
210 |
+
|
211 |
+
def decoder(hparams, name: str = "decoder"):
|
212 |
+
inputs = tf.keras.Input(shape=(None,), name="inputs")
|
213 |
+
enc_outputs = tf.keras.Input(shape=(None, hparams.d_model), name="encoder_outputs")
|
214 |
+
look_ahead_mask = tf.keras.Input(shape=(1, None, None), name="look_ahead_mask")
|
215 |
+
padding_mask = tf.keras.Input(shape=(1, 1, None), name="padding_mask")
|
216 |
+
|
217 |
+
embeddings = tf.keras.layers.Embedding(hparams.vocab_size, hparams.d_model)(inputs)
|
218 |
+
embeddings *= tf.math.sqrt(tf.cast(hparams.d_model, dtype=tf.float32))
|
219 |
+
embeddings = PositionalEncoding(
|
220 |
+
position=hparams.vocab_size, d_model=hparams.d_model
|
221 |
+
)(embeddings)
|
222 |
+
|
223 |
+
outputs = tf.keras.layers.Dropout(hparams.dropout)(embeddings)
|
224 |
+
|
225 |
+
for i in range(hparams.num_layers):
|
226 |
+
outputs = decoder_layer(
|
227 |
+
hparams,
|
228 |
+
name="decoder_layer_{}".format(i),
|
229 |
+
)(inputs=[outputs, enc_outputs, look_ahead_mask, padding_mask])
|
230 |
+
|
231 |
+
return tf.keras.Model(
|
232 |
+
inputs=[inputs, enc_outputs, look_ahead_mask, padding_mask],
|
233 |
+
outputs=outputs,
|
234 |
+
name=name,
|
235 |
+
)
|
236 |
+
|
237 |
+
|
238 |
+
|
239 |
+
def transformer(hparams, name: str = "transformer"):
|
240 |
+
inputs = tf.keras.Input(shape=(None,), name="inputs")
|
241 |
+
dec_inputs = tf.keras.Input(shape=(None,), name="dec_inputs")
|
242 |
+
|
243 |
+
enc_padding_mask = tf.keras.layers.Lambda(
|
244 |
+
create_padding_mask, output_shape=(1, 1, None), name="enc_padding_mask"
|
245 |
+
)(inputs)
|
246 |
+
|
247 |
+
look_ahead_mask = tf.keras.layers.Lambda(
|
248 |
+
create_look_ahead_mask, output_shape=(1, None, None), name="look_ahead_mask"
|
249 |
+
)(dec_inputs)
|
250 |
+
|
251 |
+
dec_padding_mask = tf.keras.layers.Lambda(
|
252 |
+
create_padding_mask, output_shape=(1, 1, None), name="dec_padding_mask"
|
253 |
+
)(inputs)
|
254 |
+
|
255 |
+
enc_outputs = encoder(hparams)(inputs=[inputs, enc_padding_mask])
|
256 |
+
|
257 |
+
dec_outputs = decoder(hparams)(
|
258 |
+
inputs=[dec_inputs, enc_outputs, look_ahead_mask, dec_padding_mask]
|
259 |
+
)
|
260 |
+
|
261 |
+
outputs = tf.keras.layers.Dense(hparams.vocab_size, name="outputs")(dec_outputs)
|
262 |
+
|
263 |
+
return tf.keras.Model(inputs=[inputs, dec_inputs], outputs=outputs, name=name)
|
264 |
+
|
265 |
+
|
requirements.txt
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
jupyter
|
2 |
+
matplotlib
|
3 |
+
tensorflow==2.9.1
|
4 |
+
tensorflow-addons==0.17.1
|
5 |
+
tensorflow-datasets==4.6.0
|
6 |
+
protobuf==3.20.3
|
7 |
+
nltk
|
training.py
ADDED
@@ -0,0 +1,150 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import argparse
|
2 |
+
import tensorflow as tf
|
3 |
+
import model
|
4 |
+
from dataset import get_dataset, preprocess_sentence
|
5 |
+
|
6 |
+
|
7 |
+
class CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
|
8 |
+
def __init__(self, d_model: int, warmup_steps: int = 4000):
|
9 |
+
super(CustomSchedule, self).__init__()
|
10 |
+
self.d_model = tf.cast(d_model, dtype=tf.float32)
|
11 |
+
self.warmup_steps = warmup_steps
|
12 |
+
|
13 |
+
def __call__(self, step):
|
14 |
+
arg1 = tf.math.rsqrt(step)
|
15 |
+
arg2 = step * self.warmup_steps**-1.5
|
16 |
+
return tf.math.rsqrt(self.d_model) * tf.math.minimum(arg1, arg2)
|
17 |
+
|
18 |
+
|
19 |
+
def inference(hparams, chatbot, tokenizer, sentence):
|
20 |
+
sentence = preprocess_sentence(sentence)
|
21 |
+
|
22 |
+
sentence = tf.expand_dims(
|
23 |
+
hparams.start_token + tokenizer.encode(sentence) + hparams.end_token, axis=0
|
24 |
+
)
|
25 |
+
|
26 |
+
output = tf.expand_dims(hparams.start_token, 0)
|
27 |
+
|
28 |
+
for _ in range(hparams.max_length):
|
29 |
+
predictions = chatbot(inputs=[sentence, output], training=False)
|
30 |
+
|
31 |
+
predictions = predictions[:, -1:, :]
|
32 |
+
predicted_id = tf.cast(tf.argmax(predictions, axis=-1), tf.int32)
|
33 |
+
|
34 |
+
if tf.equal(predicted_id, hparams.end_token[0]):
|
35 |
+
break
|
36 |
+
|
37 |
+
output = tf.concat([output, predicted_id], axis=-1)
|
38 |
+
|
39 |
+
return tf.squeeze(output, axis=0)
|
40 |
+
|
41 |
+
|
42 |
+
|
43 |
+
|
44 |
+
def predict(hparams, chatbot, tokenizer, sentence):
|
45 |
+
prediction = inference(hparams, chatbot, tokenizer, sentence)
|
46 |
+
predicted_sentence = tokenizer.decode(
|
47 |
+
[i for i in prediction if i < tokenizer.vocab_size]
|
48 |
+
)
|
49 |
+
return predicted_sentence
|
50 |
+
|
51 |
+
|
52 |
+
def evaluate(hparams, chatbot, tokenizer):
|
53 |
+
print("\nDeğerlendir")
|
54 |
+
sentence = "Merhaba nasılsın?"
|
55 |
+
output = predict(hparams, chatbot, tokenizer, sentence)
|
56 |
+
print(f"input: {sentence}\noutput: {output}")
|
57 |
+
|
58 |
+
sentence = "Sence de gökyüzü çok güzel değil mi?"
|
59 |
+
output = predict(hparams, chatbot, tokenizer, sentence)
|
60 |
+
print(f"\ninput: {sentence}\noutput: {output}")
|
61 |
+
|
62 |
+
sentence = "Sanırım uzaklara gideceğim."
|
63 |
+
for _ in range(5):
|
64 |
+
output = predict(hparams, chatbot, tokenizer, sentence)
|
65 |
+
print(f"\ninput: {sentence}\noutput: {output}")
|
66 |
+
sentence = output
|
67 |
+
|
68 |
+
|
69 |
+
def main(hparams):
|
70 |
+
tf.keras.utils.set_random_seed(1234)
|
71 |
+
|
72 |
+
data, token = get_dataset(hparams)
|
73 |
+
|
74 |
+
chatbot = model.transformer(hparams)
|
75 |
+
|
76 |
+
optimizer = tf.keras.optimizers.Adam(
|
77 |
+
CustomSchedule(d_model=hparams.d_model), beta_1=0.9, beta_2=0.98, epsilon=1e-9
|
78 |
+
)
|
79 |
+
|
80 |
+
cross_entropy = tf.keras.losses.SparseCategoricalCrossentropy(
|
81 |
+
from_logits=True, reduction="none"
|
82 |
+
)
|
83 |
+
|
84 |
+
def loss_function(y_true, y_pred):
|
85 |
+
y_true = tf.reshape(y_true, shape=(-1, hparams.max_length - 1))
|
86 |
+
loss = cross_entropy(y_true, y_pred)
|
87 |
+
mask = tf.cast(tf.not_equal(y_true, 0), dtype=tf.float32)
|
88 |
+
loss = tf.multiply(loss, mask)
|
89 |
+
return tf.reduce_mean(loss)
|
90 |
+
|
91 |
+
def accuracy(y_true, y_pred):
|
92 |
+
y_true = tf.reshape(y_true, shape=(-1, hparams.max_length - 1))
|
93 |
+
return tf.keras.metrics.sparse_categorical_accuracy(y_true, y_pred)
|
94 |
+
|
95 |
+
chatbot.compile(optimizer, loss=loss_function, metrics=[accuracy])
|
96 |
+
|
97 |
+
chatbot.fit(data, epochs=hparams.epochs)
|
98 |
+
|
99 |
+
print(f"\nmodel {hparams.save_model}'a kaydediliyor...")
|
100 |
+
tf.keras.models.save_model(
|
101 |
+
chatbot, filepath=hparams.save_model, include_optimizer=False
|
102 |
+
)
|
103 |
+
|
104 |
+
print(
|
105 |
+
f"\nclear TensorFlow backend session and load model f rom {hparams.save_model}..."
|
106 |
+
)
|
107 |
+
del chatbot
|
108 |
+
tf.keras.backend.clear_session()
|
109 |
+
chatbot = tf.keras.models.load_model(
|
110 |
+
hparams.save_model,
|
111 |
+
custom_objects={
|
112 |
+
"PositionalEncoding": model.PositionalEncoding,
|
113 |
+
"MultiHeadAttention": model.MultiHeadAttention,
|
114 |
+
},
|
115 |
+
compile=False,
|
116 |
+
)
|
117 |
+
evaluate(hparams, chatbot, token)
|
118 |
+
|
119 |
+
if __name__ == "__main__":
|
120 |
+
|
121 |
+
parser = argparse.ArgumentParser()
|
122 |
+
parser.add_argument(
|
123 |
+
"--save_model", default="model.h5", type=str, help="path save the model"
|
124 |
+
)
|
125 |
+
parser.add_argument(
|
126 |
+
"--max_samples",
|
127 |
+
default=25000,
|
128 |
+
type=int,
|
129 |
+
help="maximum number of conversation pairs to use",
|
130 |
+
)
|
131 |
+
parser.add_argument(
|
132 |
+
"--max_length", default=40, type=int, help="maximum sentence length"
|
133 |
+
)
|
134 |
+
parser.add_argument("--batch_size", default=128, type=int)
|
135 |
+
parser.add_argument("--num_layers", default=2, type=int)
|
136 |
+
parser.add_argument("--num_units", default=512, type=int)
|
137 |
+
parser.add_argument("--d_model", default=512, type=int)
|
138 |
+
parser.add_argument("--num_heads", default=8, type=int)
|
139 |
+
parser.add_argument("--dropout", default=0.1, type=float)
|
140 |
+
parser.add_argument("--activation", default="relu", type=str)
|
141 |
+
parser.add_argument("--epochs", default=70, type=int)
|
142 |
+
|
143 |
+
main(parser.parse_args())
|
144 |
+
|
145 |
+
|
146 |
+
|
147 |
+
|
148 |
+
|
149 |
+
|
150 |
+
|