Greg Thompson commited on
Commit
e90c2e9
1 Parent(s): fc1ccfb

Add initial intent classification model to nlu endpoint

Browse files
mathtext_fastapi/data/intent_classification_model.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea4954368c3b95673167ce347f2962b5508c4af295b6af58b6c11b3c1075b42e
3
+ size 127903
mathtext_fastapi/data/labeled_data.csv ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Utterance,Label
2
+ skip this,skip
3
+ this is stupid,skip
4
+ this is stupid,harder
5
+ this is stupid,feedback
6
+ I'm done,exit
7
+ quit,exit
8
+ I don't know,hint
9
+ help,hint
10
+ can I do something else?,main menu
11
+ what's going on,rapport
12
+ what's going on,main menu
13
+ tell me a joke,rapport
14
+ tell me a joke,main menu
15
+ Sorry I don't understand,do not know
16
+ Ten thousand,number
17
+ 1.234,number
18
+ "10,000",number
19
+ "123, 456",numbers
20
+ "11, 12, 13",numbers
21
+ "100, 200, 300",numbers
22
+ "100, 200",numbers
23
+ Stop for a minute,wait
24
+ Bye bye,exit
25
+ Good night,exit
26
+ Am done,exit
27
+ Yes,yes
28
+ Help,help
29
+ Idiot,harder
30
+ Stop,exit
31
+ I don't get it,hint
32
+ Math,main menu
33
+ Math,math topic
34
+ Tomorrow let do math,wait
35
+ Later,wait
36
+ Pls i will continue pls,skip
37
+ Rori tell me now,help
38
+ harder,skip
39
+ Stop for now i wont to go to School,exit
40
+ Next,next
41
+ Okay,okay
42
+ Great,affirmation
43
+ Give me for example,example
44
+ No I want to learn algebraic expressions,algebra
45
+ Hi rori,greeting
46
+ *help*,help
47
+ *Next*,next
48
+ Okay nice,okay
49
+ I don't know it,hint
50
+ Nex,next
51
+ I need a help,hint
52
+ Please can I ask your any math questions?,faq
53
+ The answer is 1,answer
54
+ The answer is 1,number
55
+ But 0.8 is also same as . 8 so I was actually right,I'm right
56
+ What is the number system?,faq
57
+ Ok thanks,thanks
58
+ I'm going to school now,exit
59
+ Let's move to another topic,main menu
60
+ "Ummanni saba
61
+ Kebena bara kana galmi keenya inni guddaan bilisummaa qofa #Gabrummaan_ammaan booda_gaha namni hundi bakka jiru irraa kutatee ka,ee jira obboleewwan goototni keenya jiran haqa Kebenaaf jechaa jiru Guraandhala 29 booda walabummaa keenya labsina Dhugaa qabna Ni injifanna *** . Naannoo giddu galeessa Itoophiyaatti #Kebenaan aanaa addaati Kun murtoo ummata Kebenaa hundaati",spam
62
+ Yes it,yes
63
+ U type fast,too fast
64
+ I mean your typing is fast,too fast
65
+ Why do u type so fast,too fast
66
+ Ur typing is fast,too fast
67
+ Can we go to a real work,harder
68
+ I know all this,harder
69
+ Answer this,preamble
70
+ Am tired,exit
71
+ This is not what I asked for,main menu
72
+ Bye,exit
73
+ 😱😱😂😂😂😡😰😰😰😒,spam
74
+ Gbxbxbcbcbbcbchcbchc,spam
75
+ I want to solve math,math topic
76
+ Pleas let start with the fraction,fractions topic
77
+ Okey,okay
78
+ i need substraction,subtraction topic
79
+ Can you please stop with me,exit
80
+ Another one,next
81
+ Harder or easy,main menu
82
+ Hard or easier,main menu
83
+ Jump topic,menu
84
+ Got it,okay
85
+ I didn't understand,don't know
86
+ Don't understand,don't know
87
+ Excuse me pls,hint
88
+ Let stop for today,exit
89
+ Help and stop asking me stupid questions,
90
+ Ykay,okay
91
+ Not interested in solving this,menu
92
+ Stpo,exit
93
+ Hiiiiiii,greeting
94
+ Hi rori,greeting
95
+ I've done this things before,harder
96
+ Which number my phone number,
97
+ Unit,main menu
98
+ No ide,don't know
99
+ No ide,hint
100
+ No idea,don't know
101
+ 🙈🤩😇🙏,spam
102
+ Thank u,thanks
103
+ Do you know programming,faq
104
+ Delete my number,unsubscribe
105
+ See u,exit
106
+ Can I go for break ??,wait
107
+ I wanna fuck,profanity
108
+ Enough of this nw,exit
109
+ Can we move to equations,equations
110
+ Do you know you are an idiot,insult
111
+ 3 digit number,number
112
+ 3 digit number,answer
113
+ Three digit number,confident answer
114
+ Three digit number,number
115
+ Good evening Rori,greeting
116
+ 89 Next,answer
117
+ 89 Next,number
118
+ 3 digit number,answer
119
+ Three digit number,answer
120
+ This is too simple,harder
121
+ Am not a kid,harder
122
+ Hey Miss Roribcan you ask me some question from Secondary 2,greeting
123
+ Hey Miss Roribcan you ask me some question from Secondary 2,faq
124
+ Hey Miss Roribcan you ask me some question from Secondary 2,main menu
125
+ don't know,hint
126
+ don't know,easier
127
+ 𝑴𝒂𝒕𝒉,math
128
+ Rori can you help me to gat value,
129
+ I called but u are not picking up,
130
+ 0.3 answer,answer
131
+ Sorry rori was101,answer
132
+ Y is it 6,answer
133
+ Y is it 6,number
134
+ 0.3 answer,number
135
+ Why 0.5,more explanation
136
+ Why 0.5,number
137
+ 6\nNext,Next
138
+ How is the answer is 11,more explanation
139
+ How comes we have 11,more explanation
140
+ Yes 6,answer
141
+ Yes 6,number
142
+ 6\nNext,number
143
+ How is the answer is 11,number
144
+ How comes we have 11,number
mathtext_fastapi/intent_classification.py ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import pandas as pd
3
+
4
+ from pathlib import Path
5
+ from sentence_transformers import SentenceTransformer
6
+ from sklearn.linear_model import LogisticRegression
7
+ from joblib import dump, load
8
+
9
+ def pickle_model(model):
10
+ DATA_DIR = Path(__file__).parent.parent / "mathtext_fastapi" / "data" / "intent_classification_model.joblib"
11
+ dump(model, DATA_DIR)
12
+
13
+
14
+ def create_intent_classification_model():
15
+ encoder = SentenceTransformer('all-MiniLM-L6-v2')
16
+ # path = list(Path.cwd().glob('*.csv'))
17
+ DATA_DIR = Path(__file__).parent.parent / "mathtext_fastapi" / "data" / "labeled_data.csv"
18
+
19
+ print("DATA_DIR")
20
+ print(f"{DATA_DIR}")
21
+
22
+ with open(f"{DATA_DIR}",'r', newline='', encoding='utf-8') as f:
23
+ df = pd.read_csv(f)
24
+ df = df[df.columns[:2]]
25
+ df = df.dropna()
26
+ X_explore = np.array([list(encoder.encode(x)) for x in df['Utterance']])
27
+ X = np.array([list(encoder.encode(x)) for x in df['Utterance']])
28
+ y = df['Label']
29
+ model = LogisticRegression(class_weight='balanced')
30
+ model.fit(X, y, sample_weight=None)
31
+
32
+ print("MODEL")
33
+ print(model)
34
+
35
+ pickle_model(model)
36
+
37
+
38
+ def retrieve_intent_classification_model():
39
+ DATA_DIR = Path(__file__).parent.parent / "mathtext_fastapi" / "data" / "intent_classification_model.joblib"
40
+ model = load(DATA_DIR)
41
+ return model
42
+
43
+
44
+ def predict_message_intent(message):
45
+ encoder = SentenceTransformer('all-MiniLM-L6-v2')
46
+ model = retrieve_intent_classification_model()
47
+ tokenized_utterance = np.array([list(encoder.encode(message))])
48
+ predicted_label = model.predict(tokenized_utterance)
49
+ predicted_probabilities = model.predict_proba(tokenized_utterance)
50
+ confidence_score = predicted_probabilities.max()
51
+
52
+ return {"type": "intent", "data": predicted_label[0], "confidence": confidence_score}
mathtext_fastapi/nlu.py CHANGED
@@ -2,6 +2,7 @@ from fuzzywuzzy import fuzz
2
  from mathtext_fastapi.logging import prepare_message_data_for_logging
3
  from mathtext.sentiment import sentiment
4
  from mathtext.text2int import text2int
 
5
  import re
6
 
7
 
@@ -142,6 +143,7 @@ def evaluate_message_with_nlu(message_data):
142
  }
143
  message_text = message_data['message_body']
144
 
 
145
  intent_api_response = run_intent_classification(message_text)
146
  if intent_api_response['data']:
147
  return intent_api_response
@@ -149,6 +151,13 @@ def evaluate_message_with_nlu(message_data):
149
  number_api_resp = text2int(message_text.lower())
150
 
151
  if number_api_resp == 32202:
 
 
 
 
 
 
 
152
  sentiment_api_resp = sentiment(message_text)
153
  nlu_response = build_nlu_response_object(
154
  'sentiment',
 
2
  from mathtext_fastapi.logging import prepare_message_data_for_logging
3
  from mathtext.sentiment import sentiment
4
  from mathtext.text2int import text2int
5
+ from mathtext_fastapi.intent_classification import create_intent_classification_model, retrieve_intent_classification_model, predict_message_intent
6
  import re
7
 
8
 
 
143
  }
144
  message_text = message_data['message_body']
145
 
146
+ # Run intent classification only for keywords
147
  intent_api_response = run_intent_classification(message_text)
148
  if intent_api_response['data']:
149
  return intent_api_response
 
151
  number_api_resp = text2int(message_text.lower())
152
 
153
  if number_api_resp == 32202:
154
+ # Run intent classification with logistic regression model
155
+ predicted_label = predict_message_intent(message_text)
156
+ if predicted_label['confidence'] > 0.01:
157
+ nlu_response = predicted_label
158
+ return nlu_response
159
+
160
+ # Run sentiment analysis
161
  sentiment_api_resp = sentiment(message_text)
162
  nlu_response = build_nlu_response_object(
163
  'sentiment',
scripts/make_request.py CHANGED
@@ -58,22 +58,23 @@ def run_simulated_request(endpoint, sample_answer, context=None):
58
  print(request)
59
 
60
 
61
- run_simulated_request('intent-classification', 'exit')
62
- run_simulated_request('sentiment-analysis', 'I reject it')
63
- run_simulated_request('text2int', 'seven thousand nine hundred fifty seven')
64
- run_simulated_request('nlu', 'test message')
65
- run_simulated_request('nlu', 'eight')
66
- run_simulated_request('nlu', 'is it 8')
67
- run_simulated_request('nlu', 'can I know how its 0.5')
68
- run_simulated_request('nlu', 'eight, nine, ten')
69
- run_simulated_request('nlu', '8, 9, 10')
70
- run_simulated_request('nlu', '8')
 
71
  run_simulated_request('nlu', "I don't know")
72
- run_simulated_request('nlu', "I don't know eight")
73
- run_simulated_request('nlu', "I don't 9")
74
- run_simulated_request('nlu', "0.2")
75
- run_simulated_request('nlu', 'Today is a wonderful day')
76
- run_simulated_request('nlu', 'IDK 5?')
77
  # run_simulated_request('manager', '')
78
  # run_simulated_request('manager', 'add')
79
  # run_simulated_request('manager', 'subtract')
 
58
  print(request)
59
 
60
 
61
+ # run_simulated_request('intent-classification', 'exit')
62
+ # run_simulated_request('intent-classification', "I'm not sure")
63
+ # run_simulated_request('sentiment-analysis', 'I reject it')
64
+ # run_simulated_request('text2int', 'seven thousand nine hundred fifty seven')
65
+ # run_simulated_request('nlu', 'test message')
66
+ # run_simulated_request('nlu', 'eight')
67
+ # run_simulated_request('nlu', 'is it 8')
68
+ # run_simulated_request('nlu', 'can I know how its 0.5')
69
+ # run_simulated_request('nlu', 'eight, nine, ten')
70
+ # run_simulated_request('nlu', '8, 9, 10')
71
+ # run_simulated_request('nlu', '8')
72
  run_simulated_request('nlu', "I don't know")
73
+ # run_simulated_request('nlu', "I don't know eight")
74
+ # run_simulated_request('nlu', "I don't 9")
75
+ # run_simulated_request('nlu', "0.2")
76
+ # run_simulated_request('nlu', 'Today is a wonderful day')
77
+ # run_simulated_request('nlu', 'IDK 5?')
78
  # run_simulated_request('manager', '')
79
  # run_simulated_request('manager', 'add')
80
  # run_simulated_request('manager', 'subtract')