Locutusque
commited on
Commit
•
4bcc566
1
Parent(s):
059ca04
Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,45 @@ This model is intended to be used for generating conversational responses in a v
|
|
25 |
|
26 |
## Training Data
|
27 |
The model is trained on a large dataset of conversational data, consisting of interactions between users and an AI assistant. The data is preprocessed to remove any sensitive information and is formatted in a way that is suitable for training a language model. The training data is split into a training set and a validation set, with the training set used to update the model parameters and the validation set used to evaluate the model performance. The model was trained on 50,000 examples over 125,000 steps, it achieved decent metrics.
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
## Model Architecture
|
30 |
The model architecture used in this model is GPT-2, a transformer-based language model that is capable of generating high-quality text with a wide range of styles and tones. The GPT-2 architecture consists of a multi-layered transformer encoder-decoder, with self-attention mechanisms that allow the model to capture long-term dependencies and generate coherent text.
|
31 |
|
|
|
25 |
|
26 |
## Training Data
|
27 |
The model is trained on a large dataset of conversational data, consisting of interactions between users and an AI assistant. The data is preprocessed to remove any sensitive information and is formatted in a way that is suitable for training a language model. The training data is split into a training set and a validation set, with the training set used to update the model parameters and the validation set used to evaluate the model performance. The model was trained on 50,000 examples over 125,000 steps, it achieved decent metrics.
|
28 |
+
This model outperformed the base GPT-2 model significantly on a new conversational dataset during a fine-tuning session. Here is a side-by-side comparison of the two models during the first steps of training
|
29 |
+
```python
|
30 |
+
# Base GPT-2
|
31 |
+
"""
|
32 |
+
Epoch 1/5, Batch 1/10000: Loss - 64.9255, Reward - 260.0000, Penalty - 624.0000, BLEU - 0.0000
|
33 |
+
Epoch 1/5, Batch 2/10000: Loss - 57.4635, Reward - 303.0000, Penalty - 870.0000, BLEU - 0.0000
|
34 |
+
Epoch 1/5, Batch 3/10000: Loss - 67.8061, Reward - 295.0000, Penalty - 908.0000, BLEU - 0.0000
|
35 |
+
Epoch 1/5, Batch 4/10000: Loss - 59.6118, Reward - 800.0000, Penalty - 740.0000, BLEU - 0.0000
|
36 |
+
Epoch 1/5, Batch 5/10000: Loss - 67.4855, Reward - 402.0000, Penalty - 806.0000, BLEU - 0.0000
|
37 |
+
Epoch 1/5, Batch 6/10000: Loss - 29.3718, Reward - 937.0000, Penalty - 760.0000, BLEU - 0.0000
|
38 |
+
Epoch 1/5, Batch 7/10000: Loss - 79.0709, Reward - 390.0000, Penalty - 1114.0000, BLEU - 0.0000
|
39 |
+
Epoch 1/5, Batch 8/10000: Loss - 61.4583, Reward - 385.0000, Penalty - 760.0000, BLEU - 0.0000
|
40 |
+
Epoch 1/5, Batch 9/10000: Loss - 56.3084, Reward - 741.0000, Penalty - 560.0000, BLEU - 3.5500
|
41 |
+
Epoch 1/5, Batch 10/10000: Loss - 80.0192, Reward - 838.0000, Penalty - 1424.0000, BLEU - 0.0000
|
42 |
+
Epoch 1/5, Batch 11/10000: Loss - 51.8236, Reward - 228.0000, Penalty - 812.0000, BLEU - 0.0001
|
43 |
+
Epoch 1/5, Batch 12/10000: Loss - 71.4071, Reward - 541.0000, Penalty - 982.0000, BLEU - 0.0000
|
44 |
+
Epoch 1/5, Batch 13/10000: Loss - 33.3624, Reward - 910.0000, Penalty - 1002.0000, BLEU - 0.0027
|
45 |
+
Epoch 1/5, Batch 14/10000: Loss - 55.9721, Reward - 808.0000, Penalty - 798.0000, BLEU - 0.0005
|
46 |
+
Epoch 1/5, Batch 15/10000: Loss - 67.0336, Reward - 517.0000, Penalty - 764.0000, BLEU - 0.0000
|
47 |
+
"""
|
48 |
+
# Conversational GPT-2
|
49 |
+
"""
|
50 |
+
Epoch 1/5, Batch 1/10000: Loss - 6.1980, Reward - 887.0000, Penalty - 1500.0000, BLEU - 0.0648
|
51 |
+
Epoch 1/5, Batch 2/10000: Loss - 4.5750, Reward - 245.0000, Penalty - 1618.0000, BLEU - 0.0008
|
52 |
+
Epoch 1/5, Batch 3/10000: Loss - 5.1264, Reward - 600.0000, Penalty - 642.0000, BLEU - 5.7981
|
53 |
+
Epoch 1/5, Batch 4/10000: Loss - 0.2995, Reward - 1020.0000, Penalty - 74.0000, BLEU - 13.8469
|
54 |
+
Epoch 1/5, Batch 5/10000: Loss - 7.9377, Reward - 203.0000, Penalty - 1700.0000, BLEU - 0.3218
|
55 |
+
Epoch 1/5, Batch 6/10000: Loss - 5.0522, Reward - 1020.0000, Penalty - 2034.0000, BLEU - 0.1946
|
56 |
+
Epoch 1/5, Batch 7/10000: Loss - 2.0585, Reward - 925.0000, Penalty - 526.0000, BLEU - 16.1298
|
57 |
+
Epoch 1/5, Batch 8/10000: Loss - 5.9736, Reward - 1009.0000, Penalty - 1844.0000, BLEU - 0.0085
|
58 |
+
Epoch 1/5, Batch 9/10000: Loss - 6.0867, Reward - 245.0000, Penalty - 1690.0000, BLEU - 1.9342
|
59 |
+
Epoch 1/5, Batch 10/10000: Loss - 7.8497, Reward - 155.0000, Penalty - 1780.0000, BLEU - 0.0115
|
60 |
+
Epoch 1/5, Batch 11/10000: Loss - 3.8887, Reward - 1012.0000, Penalty - 2010.0000, BLEU - 0.6957
|
61 |
+
Epoch 1/5, Batch 12/10000: Loss - 6.6133, Reward - 216.0000, Penalty - 1638.0000, BLEU - 1.7853
|
62 |
+
Epoch 1/5, Batch 13/10000: Loss - 1.3319, Reward - 945.0000, Penalty - 374.0000, BLEU - 0.0075
|
63 |
+
Epoch 1/5, Batch 14/10000: Loss - 2.6296, Reward - 956.0000, Penalty - 414.0000, BLEU - 3.2207
|
64 |
+
Epoch 1/5, Batch 15/10000: Loss - 6.8827, Reward - 1013.0000, Penalty - 1970.0000, BLEU - 3.7418
|
65 |
+
"""
|
66 |
+
```
|
67 |
## Model Architecture
|
68 |
The model architecture used in this model is GPT-2, a transformer-based language model that is capable of generating high-quality text with a wide range of styles and tones. The GPT-2 architecture consists of a multi-layered transformer encoder-decoder, with self-attention mechanisms that allow the model to capture long-term dependencies and generate coherent text.
|
69 |
|