Spaces:
Build error
Build error
Merge branch 'feature-intent-model' into 'staging'
Browse filesFeature intent model
See merge request tangibleai/community/mathtext-fastapi!13
mathtext_fastapi/data/intent_classification_model.joblib
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ea4954368c3b95673167ce347f2962b5508c4af295b6af58b6c11b3c1075b42e
|
3 |
+
size 127903
|
mathtext_fastapi/data/labeled_data.csv
ADDED
@@ -0,0 +1,144 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Utterance,Label
|
2 |
+
skip this,skip
|
3 |
+
this is stupid,skip
|
4 |
+
this is stupid,harder
|
5 |
+
this is stupid,feedback
|
6 |
+
I'm done,exit
|
7 |
+
quit,exit
|
8 |
+
I don't know,hint
|
9 |
+
help,hint
|
10 |
+
can I do something else?,main menu
|
11 |
+
what's going on,rapport
|
12 |
+
what's going on,main menu
|
13 |
+
tell me a joke,rapport
|
14 |
+
tell me a joke,main menu
|
15 |
+
Sorry I don't understand,do not know
|
16 |
+
Ten thousand,number
|
17 |
+
1.234,number
|
18 |
+
"10,000",number
|
19 |
+
"123, 456",numbers
|
20 |
+
"11, 12, 13",numbers
|
21 |
+
"100, 200, 300",numbers
|
22 |
+
"100, 200",numbers
|
23 |
+
Stop for a minute,wait
|
24 |
+
Bye bye,exit
|
25 |
+
Good night,exit
|
26 |
+
Am done,exit
|
27 |
+
Yes,yes
|
28 |
+
Help,help
|
29 |
+
Idiot,harder
|
30 |
+
Stop,exit
|
31 |
+
I don't get it,hint
|
32 |
+
Math,main menu
|
33 |
+
Math,math topic
|
34 |
+
Tomorrow let do math,wait
|
35 |
+
Later,wait
|
36 |
+
Pls i will continue pls,skip
|
37 |
+
Rori tell me now,help
|
38 |
+
harder,skip
|
39 |
+
Stop for now i wont to go to School,exit
|
40 |
+
Next,next
|
41 |
+
Okay,okay
|
42 |
+
Great,affirmation
|
43 |
+
Give me for example,example
|
44 |
+
No I want to learn algebraic expressions,algebra
|
45 |
+
Hi rori,greeting
|
46 |
+
*help*,help
|
47 |
+
*Next*,next
|
48 |
+
Okay nice,okay
|
49 |
+
I don't know it,hint
|
50 |
+
Nex,next
|
51 |
+
I need a help,hint
|
52 |
+
Please can I ask your any math questions?,faq
|
53 |
+
The answer is 1,answer
|
54 |
+
The answer is 1,number
|
55 |
+
But 0.8 is also same as . 8 so I was actually right,I'm right
|
56 |
+
What is the number system?,faq
|
57 |
+
Ok thanks,thanks
|
58 |
+
I'm going to school now,exit
|
59 |
+
Let's move to another topic,main menu
|
60 |
+
"Ummanni saba
|
61 |
+
Kebena bara kana galmi keenya inni guddaan bilisummaa qofa #Gabrummaan_ammaan booda_gaha namni hundi bakka jiru irraa kutatee ka,ee jira obboleewwan goototni keenya jiran haqa Kebenaaf jechaa jiru Guraandhala 29 booda walabummaa keenya labsina Dhugaa qabna Ni injifanna *** . Naannoo giddu galeessa Itoophiyaatti #Kebenaan aanaa addaati Kun murtoo ummata Kebenaa hundaati",spam
|
62 |
+
Yes it,yes
|
63 |
+
U type fast,too fast
|
64 |
+
I mean your typing is fast,too fast
|
65 |
+
Why do u type so fast,too fast
|
66 |
+
Ur typing is fast,too fast
|
67 |
+
Can we go to a real work,harder
|
68 |
+
I know all this,harder
|
69 |
+
Answer this,preamble
|
70 |
+
Am tired,exit
|
71 |
+
This is not what I asked for,main menu
|
72 |
+
Bye,exit
|
73 |
+
😱😱😂😂😂😡😰😰😰😒,spam
|
74 |
+
Gbxbxbcbcbbcbchcbchc,spam
|
75 |
+
I want to solve math,math topic
|
76 |
+
Pleas let start with the fraction,fractions topic
|
77 |
+
Okey,okay
|
78 |
+
i need substraction,subtraction topic
|
79 |
+
Can you please stop with me,exit
|
80 |
+
Another one,next
|
81 |
+
Harder or easy,main menu
|
82 |
+
Hard or easier,main menu
|
83 |
+
Jump topic,menu
|
84 |
+
Got it,okay
|
85 |
+
I didn't understand,don't know
|
86 |
+
Don't understand,don't know
|
87 |
+
Excuse me pls,hint
|
88 |
+
Let stop for today,exit
|
89 |
+
Help and stop asking me stupid questions,
|
90 |
+
Ykay,okay
|
91 |
+
Not interested in solving this,menu
|
92 |
+
Stpo,exit
|
93 |
+
Hiiiiiii,greeting
|
94 |
+
Hi rori,greeting
|
95 |
+
I've done this things before,harder
|
96 |
+
Which number my phone number,
|
97 |
+
Unit,main menu
|
98 |
+
No ide,don't know
|
99 |
+
No ide,hint
|
100 |
+
No idea,don't know
|
101 |
+
🙈🤩😇🙏,spam
|
102 |
+
Thank u,thanks
|
103 |
+
Do you know programming,faq
|
104 |
+
Delete my number,unsubscribe
|
105 |
+
See u,exit
|
106 |
+
Can I go for break ??,wait
|
107 |
+
I wanna fuck,profanity
|
108 |
+
Enough of this nw,exit
|
109 |
+
Can we move to equations,equations
|
110 |
+
Do you know you are an idiot,insult
|
111 |
+
3 digit number,number
|
112 |
+
3 digit number,answer
|
113 |
+
Three digit number,confident answer
|
114 |
+
Three digit number,number
|
115 |
+
Good evening Rori,greeting
|
116 |
+
89 Next,answer
|
117 |
+
89 Next,number
|
118 |
+
3 digit number,answer
|
119 |
+
Three digit number,answer
|
120 |
+
This is too simple,harder
|
121 |
+
Am not a kid,harder
|
122 |
+
Hey Miss Roribcan you ask me some question from Secondary 2,greeting
|
123 |
+
Hey Miss Roribcan you ask me some question from Secondary 2,faq
|
124 |
+
Hey Miss Roribcan you ask me some question from Secondary 2,main menu
|
125 |
+
don't know,hint
|
126 |
+
don't know,easier
|
127 |
+
𝑴𝒂𝒕𝒉,math
|
128 |
+
Rori can you help me to gat value,
|
129 |
+
I called but u are not picking up,
|
130 |
+
0.3 answer,answer
|
131 |
+
Sorry rori was101,answer
|
132 |
+
Y is it 6,answer
|
133 |
+
Y is it 6,number
|
134 |
+
0.3 answer,number
|
135 |
+
Why 0.5,more explanation
|
136 |
+
Why 0.5,number
|
137 |
+
6\nNext,Next
|
138 |
+
How is the answer is 11,more explanation
|
139 |
+
How comes we have 11,more explanation
|
140 |
+
Yes 6,answer
|
141 |
+
Yes 6,number
|
142 |
+
6\nNext,number
|
143 |
+
How is the answer is 11,number
|
144 |
+
How comes we have 11,number
|
mathtext_fastapi/intent_classification.py
ADDED
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import numpy as np
|
2 |
+
import pandas as pd
|
3 |
+
|
4 |
+
from pathlib import Path
|
5 |
+
from sentence_transformers import SentenceTransformer
|
6 |
+
from sklearn.linear_model import LogisticRegression
|
7 |
+
from joblib import dump, load
|
8 |
+
|
9 |
+
def pickle_model(model):
|
10 |
+
DATA_DIR = Path(__file__).parent.parent / "mathtext_fastapi" / "data" / "intent_classification_model.joblib"
|
11 |
+
dump(model, DATA_DIR)
|
12 |
+
|
13 |
+
|
14 |
+
def create_intent_classification_model():
|
15 |
+
encoder = SentenceTransformer('all-MiniLM-L6-v2')
|
16 |
+
# path = list(Path.cwd().glob('*.csv'))
|
17 |
+
DATA_DIR = Path(__file__).parent.parent / "mathtext_fastapi" / "data" / "labeled_data.csv"
|
18 |
+
|
19 |
+
print("DATA_DIR")
|
20 |
+
print(f"{DATA_DIR}")
|
21 |
+
|
22 |
+
with open(f"{DATA_DIR}",'r', newline='', encoding='utf-8') as f:
|
23 |
+
df = pd.read_csv(f)
|
24 |
+
df = df[df.columns[:2]]
|
25 |
+
df = df.dropna()
|
26 |
+
X_explore = np.array([list(encoder.encode(x)) for x in df['Utterance']])
|
27 |
+
X = np.array([list(encoder.encode(x)) for x in df['Utterance']])
|
28 |
+
y = df['Label']
|
29 |
+
model = LogisticRegression(class_weight='balanced')
|
30 |
+
model.fit(X, y, sample_weight=None)
|
31 |
+
|
32 |
+
print("MODEL")
|
33 |
+
print(model)
|
34 |
+
|
35 |
+
pickle_model(model)
|
36 |
+
|
37 |
+
|
38 |
+
def retrieve_intent_classification_model():
|
39 |
+
DATA_DIR = Path(__file__).parent.parent / "mathtext_fastapi" / "data" / "intent_classification_model.joblib"
|
40 |
+
model = load(DATA_DIR)
|
41 |
+
return model
|
42 |
+
|
43 |
+
|
44 |
+
def predict_message_intent(message):
|
45 |
+
encoder = SentenceTransformer('all-MiniLM-L6-v2')
|
46 |
+
model = retrieve_intent_classification_model()
|
47 |
+
tokenized_utterance = np.array([list(encoder.encode(message))])
|
48 |
+
predicted_label = model.predict(tokenized_utterance)
|
49 |
+
predicted_probabilities = model.predict_proba(tokenized_utterance)
|
50 |
+
confidence_score = predicted_probabilities.max()
|
51 |
+
|
52 |
+
return {"type": "intent", "data": predicted_label[0], "confidence": confidence_score}
|
mathtext_fastapi/nlu.py
CHANGED
@@ -2,6 +2,7 @@ from fuzzywuzzy import fuzz
|
|
2 |
from mathtext_fastapi.logging import prepare_message_data_for_logging
|
3 |
from mathtext.sentiment import sentiment
|
4 |
from mathtext.text2int import text2int
|
|
|
5 |
import re
|
6 |
|
7 |
|
@@ -142,6 +143,7 @@ def evaluate_message_with_nlu(message_data):
|
|
142 |
}
|
143 |
message_text = message_data['message_body']
|
144 |
|
|
|
145 |
intent_api_response = run_intent_classification(message_text)
|
146 |
if intent_api_response['data']:
|
147 |
return intent_api_response
|
@@ -149,6 +151,13 @@ def evaluate_message_with_nlu(message_data):
|
|
149 |
number_api_resp = text2int(message_text.lower())
|
150 |
|
151 |
if number_api_resp == 32202:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
152 |
sentiment_api_resp = sentiment(message_text)
|
153 |
nlu_response = build_nlu_response_object(
|
154 |
'sentiment',
|
|
|
2 |
from mathtext_fastapi.logging import prepare_message_data_for_logging
|
3 |
from mathtext.sentiment import sentiment
|
4 |
from mathtext.text2int import text2int
|
5 |
+
from mathtext_fastapi.intent_classification import create_intent_classification_model, retrieve_intent_classification_model, predict_message_intent
|
6 |
import re
|
7 |
|
8 |
|
|
|
143 |
}
|
144 |
message_text = message_data['message_body']
|
145 |
|
146 |
+
# Run intent classification only for keywords
|
147 |
intent_api_response = run_intent_classification(message_text)
|
148 |
if intent_api_response['data']:
|
149 |
return intent_api_response
|
|
|
151 |
number_api_resp = text2int(message_text.lower())
|
152 |
|
153 |
if number_api_resp == 32202:
|
154 |
+
# Run intent classification with logistic regression model
|
155 |
+
predicted_label = predict_message_intent(message_text)
|
156 |
+
if predicted_label['confidence'] > 0.01:
|
157 |
+
nlu_response = predicted_label
|
158 |
+
return nlu_response
|
159 |
+
|
160 |
+
# Run sentiment analysis
|
161 |
sentiment_api_resp = sentiment(message_text)
|
162 |
nlu_response = build_nlu_response_object(
|
163 |
'sentiment',
|
requirements.txt
CHANGED
@@ -8,6 +8,7 @@ pydantic==1.10.*
|
|
8 |
python-Levenshtein
|
9 |
requests==2.27.*
|
10 |
sentencepiece==0.1.*
|
|
|
11 |
supabase
|
12 |
transitions
|
13 |
uvicorn==0.17.*
|
|
|
8 |
python-Levenshtein
|
9 |
requests==2.27.*
|
10 |
sentencepiece==0.1.*
|
11 |
+
sentence-transformers
|
12 |
supabase
|
13 |
transitions
|
14 |
uvicorn==0.17.*
|
scripts/make_request.py
CHANGED
@@ -58,22 +58,23 @@ def run_simulated_request(endpoint, sample_answer, context=None):
|
|
58 |
print(request)
|
59 |
|
60 |
|
61 |
-
run_simulated_request('intent-classification', 'exit')
|
62 |
-
run_simulated_request('
|
63 |
-
run_simulated_request('
|
64 |
-
run_simulated_request('
|
65 |
-
run_simulated_request('nlu', '
|
66 |
-
run_simulated_request('nlu', '
|
67 |
-
run_simulated_request('nlu', '
|
68 |
-
run_simulated_request('nlu', '
|
69 |
-
run_simulated_request('nlu', '
|
70 |
-
run_simulated_request('nlu', '8')
|
|
|
71 |
run_simulated_request('nlu', "I don't know")
|
72 |
-
run_simulated_request('nlu', "I don't know eight")
|
73 |
-
run_simulated_request('nlu', "I don't 9")
|
74 |
-
run_simulated_request('nlu', "0.2")
|
75 |
-
run_simulated_request('nlu', 'Today is a wonderful day')
|
76 |
-
run_simulated_request('nlu', 'IDK 5?')
|
77 |
# run_simulated_request('manager', '')
|
78 |
# run_simulated_request('manager', 'add')
|
79 |
# run_simulated_request('manager', 'subtract')
|
|
|
58 |
print(request)
|
59 |
|
60 |
|
61 |
+
# run_simulated_request('intent-classification', 'exit')
|
62 |
+
# run_simulated_request('intent-classification', "I'm not sure")
|
63 |
+
# run_simulated_request('sentiment-analysis', 'I reject it')
|
64 |
+
# run_simulated_request('text2int', 'seven thousand nine hundred fifty seven')
|
65 |
+
# run_simulated_request('nlu', 'test message')
|
66 |
+
# run_simulated_request('nlu', 'eight')
|
67 |
+
# run_simulated_request('nlu', 'is it 8')
|
68 |
+
# run_simulated_request('nlu', 'can I know how its 0.5')
|
69 |
+
# run_simulated_request('nlu', 'eight, nine, ten')
|
70 |
+
# run_simulated_request('nlu', '8, 9, 10')
|
71 |
+
# run_simulated_request('nlu', '8')
|
72 |
run_simulated_request('nlu', "I don't know")
|
73 |
+
# run_simulated_request('nlu', "I don't know eight")
|
74 |
+
# run_simulated_request('nlu', "I don't 9")
|
75 |
+
# run_simulated_request('nlu', "0.2")
|
76 |
+
# run_simulated_request('nlu', 'Today is a wonderful day')
|
77 |
+
# run_simulated_request('nlu', 'IDK 5?')
|
78 |
# run_simulated_request('manager', '')
|
79 |
# run_simulated_request('manager', 'add')
|
80 |
# run_simulated_request('manager', 'subtract')
|