Adapters
Turkish
yusiqo commited on
Commit
5d73fef
·
verified ·
1 Parent(s): 5eab1f5

Upload 6 files

Browse files
Files changed (6) hide show
  1. README.md +111 -3
  2. chatbot.py +141 -0
  3. dataset.py +101 -0
  4. model.py +265 -0
  5. requirements.txt +7 -0
  6. training.py +150 -0
README.md CHANGED
@@ -1,3 +1,111 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Transformer Modeli ile Türkçe Chatbot
2
+
3
+ Bu depo, Transformer modeli kullanılarak oluşturulmuş bir Türkçe sohbet botunun kodunu içerir. Sohbet robotu, Türkçe konuşmalardan oluşan bir veri kümesi üzerinde eğitilmiştir ve kullanıcı girdilerine yanıtlar üretebilir.
4
+
5
+ ## Dosyalar
6
+
7
+ * **model.py:** Kodlayıcı, kod çözücü ve dikkat mekanizmaları dahil olmak üzere Transformer model mimarisini tanımlar.
8
+ * **dataset.py:** Konuşma veri kümesinin yüklenmesi, ön işlenmesi ve tokenize edilmesi için fonksiyonlar içerir.
9
+ * **chatbot.py:** Yanıt oluşturma ve kullanıcı geri bildirimi toplama dahil olmak üzere etkileşimli sohbet işlevselliğini yönetir.
10
+ * **data/lines.txt:** Eğitim için kullanılan ham metin satırlarını depolar.
11
+ * **data/conversations.txt:** Eğitim için kullanılan konuşma çiftlerini içerir.
12
+
13
+ ## Kullanım
14
+
15
+ 1. **Depoyu klonlayın:**
16
+ ```shell
17
+ git clone https://github.com/EmirhanOzl/transformer-turkish-chatbot.git
18
+ ```
19
+ 2. **Gerekli bağımlılıkları yükleyin:**
20
+ ```shell
21
+ pip install -r requirements.txt
22
+ ```
23
+ 3. **Modeli eğitin:**
24
+ ```shell
25
+ python training.py
26
+ ```
27
+ 4. **Chatbotu çalıştırın:**
28
+ ```shell
29
+ python chatbot.py
30
+ ```
31
+
32
+ ## Eğitim Verileri
33
+
34
+ Chatbot, `data/` dizininde depolanan Türkçe konuşmalardan oluşan bir veri kümesi üzerinde eğitilmiştir. Sağlanan veri setini kullanabilir veya kendi veri setinizi oluşturabilirsiniz. Veri kümesi aşağıdaki formatta olmalıdır:
35
+
36
+ Veri kümesi iki dosyadan oluşmaktadır:
37
+
38
+ * **lines.txt:** Bu dosya eğitim için kullanılan ham metin satırlarını içerir. Her satır aşağıdaki formattadır:
39
+ ```
40
+ [LINE_ID] +++$+++ [USER_ID] +++$+++ [MOVIE_ID] +++$+++ [CHARACTER_NAME] +++$+++ [TEXT]
41
+ ```
42
+
43
+ * `LINE_ID` hattın kimliğidir.
44
+ * `USER_ID` hattı konuşan kullanıcının kimliğidir.
45
+ * `MOVIE_ID` konuşmanın hakkında olduğu filmin kimliğidir (isteğe bağlı).
46
+ * `CHARACTER_NAME` satırı söyleyen karakterin adıdır (isteğe bağlı).
47
+ * `TEXT` satırın metnidir.
48
+
49
+ * **conversations.txt:** Bu dosya eğitim için kullanılan konuşma çiftlerini içerir. Her satır aşağıdaki formattadır:
50
+ ```
51
+ [USER_ID] +++$+++ [BOT_ID] +++$+++ [MOVIE_ID] +++$+++ [CONVERSATION]
52
+ ```
53
+
54
+ * `USER_ID` kullanıcının kimliğidir.
55
+ * `BOT_ID` sohbet robotunun kimliğidir.
56
+ * `MOVIE_ID` konuşmanın hakkında olduğu filmin kimliğidir (isteğe bağlı).
57
+ * `CONVERSATION`, virgülle ayrılmış, konuşma dönüşlerini temsil eden satır kimliklerinin bir listesidir.
58
+
59
+ Sağlanan veri kümesini kullanabilir veya aynı formatta kendi veri kümenizi oluşturabilirsiniz.
60
+
61
+ **Not:** `MOVIE_ID` ve `CHARACTER_NAME` alanları isteğe bağlıdır ve yerine default birşey yazılabilir.
62
+
63
+
64
+
65
+ ## Örnek çıktı
66
+
67
+ Aşağıda listelenen hiperparametreler kullanılarak aşağıdaki sonuçlar elde edilmiştir:
68
+
69
+ ```python
70
+ self.save_model = "model.h5"
71
+ self.max_samples = 25000
72
+ self.max_length = 60
73
+ self.batch_size = 128
74
+ self.num_layers = 3
75
+ self.num_units = 512
76
+ self.d_model = 256
77
+ self.num_heads = 8
78
+ self.dropout = 0.1
79
+ self.activation = "relu"
80
+ self.epochs = 100
81
+ ```
82
+
83
+ Training Results:
84
+ ```
85
+ Epoch 100/100
86
+ 24/24 [==============================] - 192s 8s/step - loss: 0.0145 - accuracy: 0.3174
87
+ ```
88
+
89
+ Örnek çıktı:
90
+
91
+ **User:** Merhaba nasılsın?
92
+ **Bot:** Merhaba! Ben iyiyim, teşekkür ederim. Sana nasıl yardımcı olabilirim bugün?
93
+
94
+ **User:** Sence de gökyüzü çok güzel değil mi?
95
+ **Bot:** O zaman hemen başlayalım, bu filmi izlemek için daha fazla beklemek istemiyorum.
96
+
97
+ **User:** Sanırım uzaklara gideceğim.
98
+ **Bot:** Nereye gitmek istersin?
99
+
100
+ **User:** Nereye gitmek istersin?
101
+ **Bot:** Belki de dünya dışına. Sence dünya dışını görebilir miyiz?
102
+
103
+ **User:** Belki de dünya dışına. Sence dünya dışını görebilir miyiz?
104
+ **Bot:** Belki bir gün teknoloji bize bu imkanı sunar, kim bilir?
105
+
106
+ **User:** Belki bir gün teknoloji bize bu imkanı sunar, kim bilir?
107
+ **Bot:** Ne zaman istersen yardımcı olmaktan mutluluk duyarım. Yaratıcı bir süreç geçirmeni dilerim!
108
+
109
+
110
+ Not: Sohbet robotunun yanıtları, kullanılan eğitim verilerine ve hiperparametrelerine bağlı olarak değişebilir.
111
+
chatbot.py ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import tensorflow as tf
3
+ import model
4
+ from dataset import get_dataset, preprocess_sentence
5
+
6
+
7
+ def inference(hparams, chatbot, tokenizer, sentence):
8
+ sentence = preprocess_sentence(sentence)
9
+
10
+ sentence = tf.expand_dims(
11
+ hparams.start_token + tokenizer.encode(sentence) + hparams.end_token, axis=0
12
+ )
13
+
14
+ output = tf.expand_dims(hparams.start_token, 0)
15
+
16
+ for _ in range(hparams.max_length):
17
+ predictions = chatbot(inputs=[sentence, output], training=False)
18
+
19
+ predictions = predictions[:, -1:, :]
20
+ predicted_id = tf.cast(tf.argmax(predictions, axis=-1), tf.int32)
21
+
22
+ if tf.equal(predicted_id, hparams.end_token[0]):
23
+ break
24
+
25
+ output = tf.concat([output, predicted_id], axis=-1)
26
+
27
+ return tf.squeeze(output, axis=0)
28
+
29
+
30
+ def predict(hparams, chatbot, tokenizer, sentence):
31
+ prediction = inference(hparams, chatbot, tokenizer, sentence)
32
+ predicted_sentence = tokenizer.decode(
33
+ [i for i in prediction if i < tokenizer.vocab_size]
34
+ )
35
+ return predicted_sentence
36
+
37
+ def read_file(file_path):
38
+ with open(file_path, 'r', encoding='utf-8') as file:
39
+ lines = file.readlines()
40
+ return lines
41
+
42
+ def append_to_file(file_path, line):
43
+ with open(file_path, 'a', encoding='utf-8') as file:
44
+ file.write(f"{line}\n")
45
+
46
+ def get_last_ids(lines_file, conversations_file):
47
+ lines = read_file(lines_file)
48
+ conversations = read_file(conversations_file)
49
+
50
+ last_line = lines[-1]
51
+ last_conversation = conversations[-1]
52
+
53
+ last_line_id = int(last_line.split(" +++$+++ ")[0][1:])
54
+ last_user_id = int(last_conversation.split(" +++$+++ ")[1][1:])
55
+ last_movie_id = int(last_conversation.split(" +++$+++ ")[2][1:])
56
+
57
+ return last_line_id, last_user_id, last_movie_id
58
+
59
+ def update_data_files(user_input, bot_response, lines_file='data/lines.txt', conversations_file='data/conversations.txt'):
60
+ last_line_id, last_user_id, last_movie_id = get_last_ids(lines_file, conversations_file)
61
+
62
+ new_line_id = f"L{last_line_id + 1}"
63
+ new_bot_line_id = f"L{last_line_id + 2}"
64
+ new_user_id = f"u{last_user_id + 1}"
65
+ new_bot_user_id = f"u{last_user_id + 2}"
66
+ new_movie_id = f"m{last_movie_id + 1}"
67
+
68
+ append_to_file(lines_file, f"{new_line_id} +++$+++ {new_user_id} +++$+++ {new_movie_id} +++$+++ Ben +++$+++ {user_input}")
69
+ append_to_file(lines_file, f"{new_bot_line_id} +++$+++ {new_bot_user_id} +++$+++ {new_movie_id} +++$+++ Bot +++$+++ {bot_response}")
70
+
71
+ new_conversation = f"{new_user_id} +++$+++ {new_bot_user_id} +++$+++ {new_movie_id} +++$+++ ['{new_line_id}', '{new_bot_line_id}']"
72
+ append_to_file(conversations_file, new_conversation)
73
+
74
+ def get_feedback():
75
+ feedback = input("Bu cevap yardımcı oldu mu? (Evet/Hayır): ").lower()
76
+ return feedback == "Evet"
77
+
78
+ def chat(hparams, chatbot, tokenizer):
79
+ print("\nCHATBOT")
80
+
81
+ for _ in range(5):
82
+ sentence = input("Sen: ")
83
+ output = predict(hparams, chatbot, tokenizer, sentence)
84
+ print(f"\nBOT: {output}")
85
+
86
+
87
+ user_input = sentence
88
+ bot_response = output
89
+
90
+ feedback = get_feedback()
91
+
92
+ if feedback:
93
+ update_data_files(user_input, bot_response)
94
+ else:
95
+ pass
96
+
97
+
98
+ def main(hparams):
99
+
100
+ _, token = get_dataset(hparams)
101
+
102
+ tf.keras.backend.clear_session()
103
+ chatbot = tf.keras.models.load_model(
104
+ hparams.save_model,
105
+ custom_objects={
106
+ "PositionalEncoding": model.PositionalEncoding,
107
+ "MultiHeadAttention": model.MultiHeadAttention,
108
+ },
109
+ compile=False,
110
+ )
111
+
112
+
113
+ chat(hparams, chatbot, token)
114
+
115
+
116
+ if __name__ == "__main__":
117
+
118
+ parser = argparse.ArgumentParser()
119
+ parser.add_argument(
120
+ "--save_model", default="model.h5", type=str, help="path save the model"
121
+ )
122
+ parser.add_argument(
123
+ "--max_samples",
124
+ default=25000,
125
+ type=int,
126
+ help="maximum number of conversation pairs to use",
127
+ )
128
+ parser.add_argument(
129
+ "--max_length", default=40, type=int, help="maximum sentence length"
130
+ )
131
+ parser.add_argument("--batch_size", default=64, type=int)
132
+ parser.add_argument("--num_layers", default=2, type=int)
133
+ parser.add_argument("--num_units", default=512, type=int)
134
+ parser.add_argument("--d_model", default=256, type=int)
135
+ parser.add_argument("--num_heads", default=8, type=int)
136
+ parser.add_argument("--dropout", default=0.1, type=float)
137
+ parser.add_argument("--activation", default="relu", type=str)
138
+ parser.add_argument("--epochs", default=80, type=int)
139
+
140
+ main(parser.parse_args())
141
+
dataset.py ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ import tensorflow as tf
3
+ import tensorflow_datasets as tfds
4
+ import nltk
5
+ from nltk.stem import WordNetLemmatizer
6
+
7
+ nltk.download('wordnet')
8
+ nltk.download('punkt')
9
+
10
+ lemmatizer = WordNetLemmatizer()
11
+
12
+
13
+ def preprocess_sentence(sentence):
14
+ sentence = sentence.lower().strip()
15
+
16
+ sentence = re.sub(r"([?.!¿])", r" \1 ", sentence)
17
+ sentence = re.sub(r'[" "]+', " ", sentence)
18
+ sentence = re.sub(r"[-()\"#/@;:<>{}+=~|.?,]", "", sentence)
19
+
20
+ sentence = re.sub(r"[^a-zA-ZğüşöçıİĞÜŞÖÇ?.!,¿]+", " ", sentence)
21
+ sentence = sentence.strip()
22
+
23
+ sentence = ' '.join([lemmatizer.lemmatize(w) for w in nltk.word_tokenize(sentence)])
24
+
25
+ return sentence
26
+
27
+ def load_conversations(hparams, lines_file, conversations_file):
28
+ id2line = {}
29
+
30
+ with open(lines_file, encoding = "utf-8", errors="ignore") as file:
31
+ lines = file.readlines()
32
+
33
+ for line in lines:
34
+ parts = line.replace("\n", "").split(" +++$+++ ")
35
+ id2line[parts[0]] = parts[4]
36
+
37
+ questions = []
38
+ answers = []
39
+
40
+ with open(conversations_file, "r") as file:
41
+ lines = file.readlines()
42
+ for line in lines:
43
+ parts = line.replace("\n", "").split(" +++$+++ ")
44
+ conversation = [line[1:-1] for line in parts[3][1:-1].split(", ")]
45
+ for i in range(len(conversation) - 1):
46
+ questions.append(preprocess_sentence(id2line[conversation[i]]))
47
+ answers.append(preprocess_sentence(id2line[conversation[i + 1]]))
48
+ if len(questions) >= hparams.max_samples:
49
+ return questions, answers
50
+
51
+ return questions, answers
52
+
53
+
54
+ def tokenize(hparams, tokenizer, questions, answers):
55
+ tokenized_inputs, tokenized_outputs = [], []
56
+
57
+ for (question, answer) in zip(questions, answers):
58
+ sentence1 = hparams.start_token + tokenizer.encode(question) + hparams.end_token
59
+ sentence2 = hparams.start_token + tokenizer.encode(answer) + hparams.end_token
60
+
61
+ if (len(sentence1) <= hparams.max_length and len(sentence2) <= hparams.max_length):
62
+ tokenized_inputs.append(sentence1)
63
+ tokenized_outputs.append(sentence2)
64
+
65
+ tokenized_inputs = tf.keras.preprocessing.sequence.pad_sequences(
66
+ tokenized_inputs, maxlen=hparams.max_length, padding="post")
67
+ tokenized_outputs = tf.keras.preprocessing.sequence.pad_sequences(
68
+ tokenized_outputs, maxlen=hparams.max_length, padding="post")
69
+
70
+ return tokenized_inputs, tokenized_outputs
71
+
72
+
73
+ def get_dataset(hparams):
74
+ lines_file ="data/lines.txt"
75
+ conversations_file = "data/conversations.txt"
76
+
77
+ questions, answers = load_conversations(hparams, lines_file, conversations_file)
78
+
79
+ tokenizer = tfds.deprecated.text.SubwordTextEncoder.build_from_corpus(questions + answers, target_vocab_size=2**13)
80
+
81
+ tokenizer.save_to_file('tokenizer')
82
+
83
+ hparams.start_token = [tokenizer.vocab_size]
84
+ hparams.end_token = [tokenizer.vocab_size + 1]
85
+ hparams.vocab_size = tokenizer.vocab_size + 2
86
+
87
+ questions, answers = tokenize(hparams, tokenizer, questions, answers)
88
+
89
+ dataset = tf.data.Dataset.from_tensor_slices(
90
+ ({"inputs": questions, "dec_inputs": answers[:, :-1]}, answers[:, 1:])
91
+ )
92
+
93
+ dataset = dataset.cache()
94
+ dataset = dataset.shuffle(len(questions))
95
+ dataset = dataset.batch(hparams.batch_size)
96
+ dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
97
+
98
+ return dataset, tokenizer
99
+
100
+
101
+
model.py ADDED
@@ -0,0 +1,265 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import tensorflow as tf
2
+
3
+ def sdp_attention(query, key, value, mask):
4
+ matmul_qk = tf.matmul(query, key, transpose_b=True)
5
+ depth = tf.cast(tf.shape(key)[-1], tf.float32)
6
+ logits = matmul_qk / tf.math.sqrt(depth)
7
+ if mask is not None:
8
+ logits += mask * -1e9
9
+ attention_weights = tf.nn.softmax(logits, axis=-1)
10
+ output = tf.matmul(attention_weights, value)
11
+ return output
12
+
13
+ class MultiHeadAttention(tf.keras.layers.Layer):
14
+ def __init__(self, num_heads, d_model, **kwargs):
15
+ assert d_model % num_heads == 0
16
+ super(MultiHeadAttention, self).__init__(**kwargs)
17
+ self.num_heads = num_heads
18
+ self.d_model = d_model
19
+
20
+ self.depth = self.d_model // self.num_heads
21
+
22
+ self.query_dense = tf.keras.layers.Dense(self.d_model)
23
+ self.key_dense = tf.keras.layers.Dense(self.d_model)
24
+ self.value_dense = tf.keras.layers.Dense(self.d_model)
25
+
26
+ self.dense = tf.keras.layers.Dense(self.d_model)
27
+
28
+
29
+ def get_config(self):
30
+ config = super(MultiHeadAttention, self).get_config()
31
+ config.update({"num_heads": self.num_heads, "d_model": self.d_model})
32
+ return config
33
+
34
+ def split_heads(self, inputs: tf.Tensor, batch_size: int):
35
+ inputs = tf.keras.layers.Lambda(
36
+ lambda inputs: tf.reshape(
37
+ inputs, shape=(batch_size, -1, self.num_heads, self.depth))
38
+ )(inputs)
39
+ return tf.keras.layers.Lambda(
40
+ lambda inputs: tf.transpose(inputs, perm=[0, 2, 1, 3])
41
+ )(inputs)
42
+
43
+ def call(self, inputs: tf.Tensor):
44
+ query, key, value, mask = (
45
+ inputs["query"],
46
+ inputs["key"],
47
+ inputs["value"],
48
+ inputs["mask"],
49
+ )
50
+ batch_size = tf.shape(query)[0]
51
+
52
+ query = self.query_dense(query)
53
+ key = self.key_dense(key)
54
+ value = self.value_dense(value)
55
+
56
+ query = self.split_heads(query, batch_size)
57
+ key = self.split_heads(key, batch_size)
58
+ value = self.split_heads(value, batch_size)
59
+
60
+ scaled_attention = sdp_attention(query, key, value, mask)
61
+ scaled_attention = tf.keras.layers.Lambda(
62
+ lambda scaled_attention: tf.transpose(scaled_attention, perm=[0, 2, 1, 3])
63
+ )(scaled_attention)
64
+
65
+ concat_attention = tf.keras.layers.Lambda(
66
+ lambda scaled_attention: tf.reshape(
67
+ scaled_attention, (batch_size, -1, self.d_model)
68
+ )
69
+ )(scaled_attention)
70
+
71
+ outputs = self.dense(concat_attention)
72
+
73
+ return outputs
74
+
75
+
76
+ def create_padding_mask(x):
77
+ mask = tf.cast(tf.math.equal(x, 0), dtype=tf.float32)
78
+ return mask[:, tf.newaxis, tf.newaxis, :]
79
+
80
+
81
+ def create_look_ahead_mask(x):
82
+ seq_len = tf.shape(x)[1]
83
+ look_ahead_mask = 1 - tf.linalg.band_part(
84
+ tf.ones((seq_len, seq_len), dtype=tf.float32), -1, 0
85
+ )
86
+ padding_mask = create_padding_mask(x)
87
+ return tf.maximum(look_ahead_mask, padding_mask)
88
+
89
+
90
+ class PositionalEncoding(tf.keras.layers.Layer):
91
+ def __init__(self, position: int, d_model: int, **kwargs):
92
+ super(PositionalEncoding, self).__init__(**kwargs)
93
+ self.position = position
94
+ self.d_model = d_model
95
+ self.pos_encoding = self.positional_encoding(position, d_model)
96
+
97
+ def get_config(self):
98
+ config = super(PositionalEncoding, self).get_config()
99
+ config.update({"position": self.position, "d_model": self.d_model})
100
+ return config
101
+
102
+ def get_angles(self, position: tf.Tensor, i: tf.Tensor, d_model: tf.Tensor):
103
+ angles = 1 / tf.pow(10000, (2 * (i // 2)) / d_model)
104
+ return position * angles
105
+
106
+ def positional_encoding(self, position: int, d_model: int):
107
+ angle_rads = self.get_angles(
108
+ position=tf.cast(tf.range(position)[:, tf.newaxis], dtype=tf.float32),
109
+ i=tf.cast(tf.range(d_model)[tf.newaxis, :], dtype=tf.float32),
110
+ d_model=tf.cast(d_model, dtype=tf.float32),
111
+ )
112
+ sines = tf.math.sin(angle_rads[:, 0::2])
113
+ cosines = tf.math.cos(angle_rads[:, 1::2])
114
+
115
+ pos_encoding = tf.concat([sines, cosines], axis=-1)
116
+ pos_encoding = pos_encoding[tf.newaxis, ...]
117
+ return pos_encoding
118
+
119
+ def call(self, inputs: tf.Tensor):
120
+ return inputs + self.pos_encoding[:, : tf.shape(inputs)[1], :]
121
+
122
+ def encoder_layer(hparams, name: str = "encoder_layer"):
123
+ inputs = tf.keras.Input(shape=(None, hparams.d_model), name="inputs")
124
+ padding_mask = tf.keras.Input(shape=(1, 1, None), name="padding_mask")
125
+
126
+ attention = MultiHeadAttention(
127
+ num_heads=hparams.num_heads, d_model=hparams.d_model, name="attention"
128
+ )({"query": inputs, "key": inputs, "value": inputs, "mask": padding_mask})
129
+ attention = tf.keras.layers.Dropout(hparams.dropout)(attention)
130
+ attention += tf.cast(inputs, dtype=tf.float32)
131
+ attention = tf.keras.layers.LayerNormalization(epsilon=1e-6)(attention)
132
+
133
+ outputs = tf.keras.layers.Dense(hparams.num_units, activation=hparams.activation)(
134
+ attention
135
+ )
136
+ outputs = tf.keras.layers.Dense(hparams.d_model)(outputs)
137
+ outputs = tf.keras.layers.Dropout(hparams.dropout)(outputs)
138
+ outputs += attention
139
+ outputs = tf.keras.layers.LayerNormalization(epsilon=1e-6)(outputs)
140
+
141
+ return tf.keras.Model(inputs=[inputs, padding_mask], outputs=outputs, name=name)
142
+
143
+ def encoder(hparams, name: str = "encoder"):
144
+ inputs = tf.keras.Input(shape=(None,), name="inputs")
145
+ padding_mask = tf.keras.Input(shape=(1, 1, None), name="padding_mask")
146
+
147
+ embeddings = tf.keras.layers.Embedding(hparams.vocab_size, hparams.d_model)(inputs)
148
+ embeddings *= tf.math.sqrt(tf.cast(hparams.d_model, dtype=tf.float32))
149
+ embeddings = PositionalEncoding(
150
+ position=hparams.vocab_size, d_model=hparams.d_model
151
+ )(embeddings)
152
+
153
+ outputs = tf.keras.layers.Dropout(hparams.dropout)(embeddings)
154
+
155
+ for i in range(hparams.num_layers):
156
+ outputs = encoder_layer(hparams, name=f"encoder_layer_{i}")(
157
+ [outputs, padding_mask]
158
+ )
159
+
160
+ return tf.keras.Model(inputs=[inputs, padding_mask], outputs=outputs, name=name)
161
+
162
+ def decoder_layer(hparams, name: str = "decoder_layer"):
163
+ inputs = tf.keras.Input(shape=(None, hparams.d_model), name="inputs")
164
+ enc_outputs = tf.keras.Input(shape=(None, hparams.d_model), name="encoder_outputs")
165
+ look_ahead_mask = tf.keras.Input(shape=(1, None, None), name="look_ahead_mask")
166
+ padding_mask = tf.keras.Input(shape=(1, 1, None), name="padding_mask")
167
+
168
+ attention1 = MultiHeadAttention(
169
+ num_heads=hparams.num_heads, d_model=hparams.d_model, name="attention_1"
170
+ )(
171
+ inputs={
172
+ "query": inputs,
173
+ "key": inputs,
174
+ "value": inputs,
175
+ "mask": look_ahead_mask,
176
+ }
177
+ )
178
+ attention1 += tf.cast(inputs, dtype=tf.float32)
179
+ attention1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)(attention1)
180
+
181
+ attention2 = MultiHeadAttention(
182
+ num_heads=hparams.num_heads, d_model=hparams.d_model, name="attention_2"
183
+ )(
184
+ inputs={
185
+ "query": attention1,
186
+ "key": enc_outputs,
187
+ "value": enc_outputs,
188
+ "mask": padding_mask,
189
+ }
190
+ )
191
+ attention2 = tf.keras.layers.Dropout(hparams.dropout)(attention2)
192
+ attention2 += attention1
193
+ attention2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)(
194
+ attention2 + attention1
195
+ )
196
+
197
+ outputs = tf.keras.layers.Dense(hparams.num_units, activation=hparams.activation)(
198
+ attention2
199
+ )
200
+ outputs = tf.keras.layers.Dense(hparams.d_model)(outputs)
201
+ outputs = tf.keras.layers.Dropout(hparams.dropout)(outputs)
202
+ outputs += attention2
203
+ outputs = tf.keras.layers.LayerNormalization(epsilon=1e-6)(outputs)
204
+
205
+ return tf.keras.Model(
206
+ inputs=[inputs, enc_outputs, look_ahead_mask, padding_mask],
207
+ outputs=outputs,
208
+ name=name,
209
+ )
210
+
211
+ def decoder(hparams, name: str = "decoder"):
212
+ inputs = tf.keras.Input(shape=(None,), name="inputs")
213
+ enc_outputs = tf.keras.Input(shape=(None, hparams.d_model), name="encoder_outputs")
214
+ look_ahead_mask = tf.keras.Input(shape=(1, None, None), name="look_ahead_mask")
215
+ padding_mask = tf.keras.Input(shape=(1, 1, None), name="padding_mask")
216
+
217
+ embeddings = tf.keras.layers.Embedding(hparams.vocab_size, hparams.d_model)(inputs)
218
+ embeddings *= tf.math.sqrt(tf.cast(hparams.d_model, dtype=tf.float32))
219
+ embeddings = PositionalEncoding(
220
+ position=hparams.vocab_size, d_model=hparams.d_model
221
+ )(embeddings)
222
+
223
+ outputs = tf.keras.layers.Dropout(hparams.dropout)(embeddings)
224
+
225
+ for i in range(hparams.num_layers):
226
+ outputs = decoder_layer(
227
+ hparams,
228
+ name="decoder_layer_{}".format(i),
229
+ )(inputs=[outputs, enc_outputs, look_ahead_mask, padding_mask])
230
+
231
+ return tf.keras.Model(
232
+ inputs=[inputs, enc_outputs, look_ahead_mask, padding_mask],
233
+ outputs=outputs,
234
+ name=name,
235
+ )
236
+
237
+
238
+
239
+ def transformer(hparams, name: str = "transformer"):
240
+ inputs = tf.keras.Input(shape=(None,), name="inputs")
241
+ dec_inputs = tf.keras.Input(shape=(None,), name="dec_inputs")
242
+
243
+ enc_padding_mask = tf.keras.layers.Lambda(
244
+ create_padding_mask, output_shape=(1, 1, None), name="enc_padding_mask"
245
+ )(inputs)
246
+
247
+ look_ahead_mask = tf.keras.layers.Lambda(
248
+ create_look_ahead_mask, output_shape=(1, None, None), name="look_ahead_mask"
249
+ )(dec_inputs)
250
+
251
+ dec_padding_mask = tf.keras.layers.Lambda(
252
+ create_padding_mask, output_shape=(1, 1, None), name="dec_padding_mask"
253
+ )(inputs)
254
+
255
+ enc_outputs = encoder(hparams)(inputs=[inputs, enc_padding_mask])
256
+
257
+ dec_outputs = decoder(hparams)(
258
+ inputs=[dec_inputs, enc_outputs, look_ahead_mask, dec_padding_mask]
259
+ )
260
+
261
+ outputs = tf.keras.layers.Dense(hparams.vocab_size, name="outputs")(dec_outputs)
262
+
263
+ return tf.keras.Model(inputs=[inputs, dec_inputs], outputs=outputs, name=name)
264
+
265
+
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ jupyter
2
+ matplotlib
3
+ tensorflow==2.9.1
4
+ tensorflow-addons==0.17.1
5
+ tensorflow-datasets==4.6.0
6
+ protobuf==3.20.3
7
+ nltk
training.py ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import tensorflow as tf
3
+ import model
4
+ from dataset import get_dataset, preprocess_sentence
5
+
6
+
7
+ class CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
8
+ def __init__(self, d_model: int, warmup_steps: int = 4000):
9
+ super(CustomSchedule, self).__init__()
10
+ self.d_model = tf.cast(d_model, dtype=tf.float32)
11
+ self.warmup_steps = warmup_steps
12
+
13
+ def __call__(self, step):
14
+ arg1 = tf.math.rsqrt(step)
15
+ arg2 = step * self.warmup_steps**-1.5
16
+ return tf.math.rsqrt(self.d_model) * tf.math.minimum(arg1, arg2)
17
+
18
+
19
+ def inference(hparams, chatbot, tokenizer, sentence):
20
+ sentence = preprocess_sentence(sentence)
21
+
22
+ sentence = tf.expand_dims(
23
+ hparams.start_token + tokenizer.encode(sentence) + hparams.end_token, axis=0
24
+ )
25
+
26
+ output = tf.expand_dims(hparams.start_token, 0)
27
+
28
+ for _ in range(hparams.max_length):
29
+ predictions = chatbot(inputs=[sentence, output], training=False)
30
+
31
+ predictions = predictions[:, -1:, :]
32
+ predicted_id = tf.cast(tf.argmax(predictions, axis=-1), tf.int32)
33
+
34
+ if tf.equal(predicted_id, hparams.end_token[0]):
35
+ break
36
+
37
+ output = tf.concat([output, predicted_id], axis=-1)
38
+
39
+ return tf.squeeze(output, axis=0)
40
+
41
+
42
+
43
+
44
+ def predict(hparams, chatbot, tokenizer, sentence):
45
+ prediction = inference(hparams, chatbot, tokenizer, sentence)
46
+ predicted_sentence = tokenizer.decode(
47
+ [i for i in prediction if i < tokenizer.vocab_size]
48
+ )
49
+ return predicted_sentence
50
+
51
+
52
+ def evaluate(hparams, chatbot, tokenizer):
53
+ print("\nDeğerlendir")
54
+ sentence = "Merhaba nasılsın?"
55
+ output = predict(hparams, chatbot, tokenizer, sentence)
56
+ print(f"input: {sentence}\noutput: {output}")
57
+
58
+ sentence = "Sence de gökyüzü çok güzel değil mi?"
59
+ output = predict(hparams, chatbot, tokenizer, sentence)
60
+ print(f"\ninput: {sentence}\noutput: {output}")
61
+
62
+ sentence = "Sanırım uzaklara gideceğim."
63
+ for _ in range(5):
64
+ output = predict(hparams, chatbot, tokenizer, sentence)
65
+ print(f"\ninput: {sentence}\noutput: {output}")
66
+ sentence = output
67
+
68
+
69
+ def main(hparams):
70
+ tf.keras.utils.set_random_seed(1234)
71
+
72
+ data, token = get_dataset(hparams)
73
+
74
+ chatbot = model.transformer(hparams)
75
+
76
+ optimizer = tf.keras.optimizers.Adam(
77
+ CustomSchedule(d_model=hparams.d_model), beta_1=0.9, beta_2=0.98, epsilon=1e-9
78
+ )
79
+
80
+ cross_entropy = tf.keras.losses.SparseCategoricalCrossentropy(
81
+ from_logits=True, reduction="none"
82
+ )
83
+
84
+ def loss_function(y_true, y_pred):
85
+ y_true = tf.reshape(y_true, shape=(-1, hparams.max_length - 1))
86
+ loss = cross_entropy(y_true, y_pred)
87
+ mask = tf.cast(tf.not_equal(y_true, 0), dtype=tf.float32)
88
+ loss = tf.multiply(loss, mask)
89
+ return tf.reduce_mean(loss)
90
+
91
+ def accuracy(y_true, y_pred):
92
+ y_true = tf.reshape(y_true, shape=(-1, hparams.max_length - 1))
93
+ return tf.keras.metrics.sparse_categorical_accuracy(y_true, y_pred)
94
+
95
+ chatbot.compile(optimizer, loss=loss_function, metrics=[accuracy])
96
+
97
+ chatbot.fit(data, epochs=hparams.epochs)
98
+
99
+ print(f"\nmodel {hparams.save_model}'a kaydediliyor...")
100
+ tf.keras.models.save_model(
101
+ chatbot, filepath=hparams.save_model, include_optimizer=False
102
+ )
103
+
104
+ print(
105
+ f"\nclear TensorFlow backend session and load model f rom {hparams.save_model}..."
106
+ )
107
+ del chatbot
108
+ tf.keras.backend.clear_session()
109
+ chatbot = tf.keras.models.load_model(
110
+ hparams.save_model,
111
+ custom_objects={
112
+ "PositionalEncoding": model.PositionalEncoding,
113
+ "MultiHeadAttention": model.MultiHeadAttention,
114
+ },
115
+ compile=False,
116
+ )
117
+ evaluate(hparams, chatbot, token)
118
+
119
+ if __name__ == "__main__":
120
+
121
+ parser = argparse.ArgumentParser()
122
+ parser.add_argument(
123
+ "--save_model", default="model.h5", type=str, help="path save the model"
124
+ )
125
+ parser.add_argument(
126
+ "--max_samples",
127
+ default=25000,
128
+ type=int,
129
+ help="maximum number of conversation pairs to use",
130
+ )
131
+ parser.add_argument(
132
+ "--max_length", default=40, type=int, help="maximum sentence length"
133
+ )
134
+ parser.add_argument("--batch_size", default=128, type=int)
135
+ parser.add_argument("--num_layers", default=2, type=int)
136
+ parser.add_argument("--num_units", default=512, type=int)
137
+ parser.add_argument("--d_model", default=512, type=int)
138
+ parser.add_argument("--num_heads", default=8, type=int)
139
+ parser.add_argument("--dropout", default=0.1, type=float)
140
+ parser.add_argument("--activation", default="relu", type=str)
141
+ parser.add_argument("--epochs", default=70, type=int)
142
+
143
+ main(parser.parse_args())
144
+
145
+
146
+
147
+
148
+
149
+
150
+