File size: 6,681 Bytes
db734d1
 
 
 
 
1f35170
db734d1
1f35170
 
 
 
db734d1
1f35170
 
 
db734d1
 
1f35170
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db734d1
 
 
 
1f35170
db734d1
1f35170
db734d1
1f35170
db734d1
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
---
library_name: transformers
tags: []
---

# SUMMARY

Just a model using to learn Fine Tuning of 'DialoGPT-medium'
- on a self made datasets
- on a self made special tokens
- on a multiple fine tuned with ~30K dataset (in progress mode)

If interested in how I got to this point and how I created the datasets you can visit:  
[Crafting GPT2 for Personalized AI-Preparing Data the Long Way](https://medium.com/@deeokay/the-soul-in-the-machine-crafting-gpt2-for-personalized-ai-9d38be3f635f)
<!-- Provide a quick summary of what the model is/does. -->


## DECLARING NEW SPECIAL TOKENS 

```python
special_tokens_dict = {
    'eos_token': '<|STOP|>',
    'bos_token': '<|STOP|>',
    'pad_token': '<|PAD|>',
    'additional_special_tokens': ['<|BEGIN_QUERY|>', '<|BEGIN_QUERY|>', 
                                  '<|BEGIN_ANALYSIS|>', '<|END_ANALYSIS|>',
                                  '<|BEGIN_RESPONSE|>', '<|END_RESPONSE|>',
                                  '<|BEGIN_SENTIMENT|>', '<|END_SENTIMENT|>',
                                  '<|BEGIN_CLASSIFICATION|>', '<|END_CLASSIFICATION|>',]
}

tokenizer.add_special_tokens(special_tokens_dict)
model.resize_token_embeddings(len(tokenizer))

tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids('<|STOP|>')
tokenizer.bos_token_id = tokenizer.convert_tokens_to_ids('<|STOP|>')
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids('<|PAD|>')
```

The order of tokens is as follows: 

```python
def combine_text(user_prompt, analysis, sentiment, new_response, classification):
    user_q = f"<|STOP|><|BEGIN_QUERY|>{user_prompt}<|END_QUERY|>"
    analysis = f"<|BEGIN_ANALYSIS|>{analysis}<|END_ANALYSIS|>"
    new_response = f"<|BEGIN_RESPONSE|>{new_response}<|END_RESPONSE|>"
    sentiment = f"<|BEGIN_SENTIMENT|>Sentiment: {sentiment}<|END_SENTIMENT|><|STOP|>"
    classification = f"<|BEGIN_CLASSIFICATION|>{classification}<|END_CLASSIFICATION|>"
    return user_q + analysis + new_response + classification + sentiment
```

## INFERANCING 

I am currently testing two ways, if anyone knows a better one, please let me know! 

```python
import torch
from transformers import AutoModelForCausalLLM, AutoTokenizer

models_folder = "Deeokay/DialoGPT-special-tokens-medium4"

model = AutoModelForCausalLM.from_pretrained(models_folder)
tokenizer = AutoTokenizer.from_pretrained(models_folder)

# Device configuration <<change as needed>> 
device = torch.device("cpu")
model.to(device)

```

### OPTION 1 INFERFENCE

```python
import time

class Stopwatch:
    def __init__(self):
        self.start_time = None
        self.end_time = None

    def start(self):
        self.start_time = time.time()

    def stop(self):
        self.end_time = time.time()

    def elapsed_time(self):
        if self.start_time is None:
            return "Stopwatch hasn't been started"
        if self.end_time is None:
            return "Stopwatch hasn't been stopped"
        return self.end_time - self.start_time

stopwatch1 = Stopwatch()

def generate_response(input_text, max_length=250):
    
	stopwatch1.start()
    
	# Prepare the input
	# input_text = f"<|BEGIN_QUERY|>{input_text}<|END_QUERY|><|BEGIN_ANALYSIS|>{input_text}<|END_ANALYSIS|><|BEGIN_RESPONSE|>"
	input_text = f"<|BEGIN_QUERY|>{input_text}<|END_QUERY|><|BEGIN_ANALYSIS|>"
    
	input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)

	# Create attention mask
	attention_mask = torch.ones_like(input_ids).to(device)
    
	# Generate
	output = model.generate(
    	input_ids,
    	max_new_tokens=max_length,
    	num_return_sequences=1,
    	no_repeat_ngram_size=2,
    	attention_mask=attention_mask,
    	pad_token_id=tokenizer.eos_token_id,
    	eos_token_id=tokenizer.convert_tokens_to_ids('<|STOP|>'),
	)
    
	stopwatch1.stop()
	return tokenizer.decode(output[0], skip_special_tokens=False)
```

### OPTION 2 INFERNCE

```python
import time

class Stopwatch:
    def __init__(self):
        self.start_time = None
        self.end_time = None

    def start(self):
        self.start_time = time.time()

    def stop(self):
        self.end_time = time.time()

    def elapsed_time(self):
        if self.start_time is None:
            return "Stopwatch hasn't been started"
        if self.end_time is None:
            return "Stopwatch hasn't been stopped"
        return self.end_time - self.start_time

stopwatch2 = Stopwatch()

def generate_response2(input_text, max_length=250):
    
	stopwatch2.start()
    
	# Prepare the input
	# input_text = f"<|BEGIN_QUERY|>{input_text}<|END_QUERY|><|BEGIN_ANALYSIS|>{input_text}<|END_ANALYSIS|><|BEGIN_RESPONSE|>"
	input_text = f"<|BEGIN_QUERY|>{input_text}<|END_QUERY|><|BEGIN_ANALYSIS|>"
	input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)

	# Create attention mask
	attention_mask = torch.ones_like(input_ids).to(device)

	# # 2ND OPTION FOR : Generate
	output = model.generate(
    	input_ids,
    	max_new_tokens=max_length,
    	attention_mask=attention_mask,
    	do_sample=True,
    	temperature=0.4,
    	top_k=60,
    	no_repeat_ngram_size=2,
    	pad_token_id=tokenizer.pad_token_id,
    	eos_token_id=tokenizer.eos_token_id,
	)
    
	stopwatch2.stop()
	return tokenizer.decode(output[0], skip_special_tokens=False)
```
### DECODING ANSWER

When I need just the response

```python
def decode(text):
	full_text = text
    
	# Extract the response part
	start_token = "<|BEGIN_RESPONSE|>"
	end_token = "<|END_RESPONSE|>"
	start_idx = full_text.find(start_token)
	end_idx = full_text.find(end_token)
    
	if start_idx != -1 and end_idx != -1:
    	response = full_text[start_idx + len(start_token):end_idx].strip()
	else:
    	response = full_text.strip()
    
	return response
```

### MY SETUP

I use the stopwatch to time the responses and I use both inference to see the difference 

```python
input_text = "Who is Steve Jobs and what was contribution?"
response1_full = generate_response(input_text)
#response1 = decode(response1_full)
print(f"Input: {input_text}")
print("=======================================")
print(f"Response1: {response1_full}")
elapsed1 = stopwatch1.elapsed_time()
print(f"Process took {elapsed1:.4f} seconds")
print("=======================================")
response2_full = generate_response2(input_text)
#response2 = decode(response2_full)
print(f"Response2: {response2_full}")
elapsed2 = stopwatch2.elapsed_time()
print(f"Process took {elapsed2:.4f} seconds")
print("=======================================")
```


### Out-of-Scope Use

Well everything that has a factual data.. trust at your own risk! 

Never tested on mathamatical knowledge. 

I quite enjoy how the response feels closer to what I had in mind..