Spaces:
Runtime error
Runtime error
Merge branch 'feature-intent-model' into 'staging'
Browse filesFeature intent model
See merge request tangibleai/community/mathtext-fastapi!13
mathtext_fastapi/data/intent_classification_model.joblib
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ea4954368c3b95673167ce347f2962b5508c4af295b6af58b6c11b3c1075b42e
|
| 3 |
+
size 127903
|
mathtext_fastapi/data/labeled_data.csv
ADDED
|
@@ -0,0 +1,144 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Utterance,Label
|
| 2 |
+
skip this,skip
|
| 3 |
+
this is stupid,skip
|
| 4 |
+
this is stupid,harder
|
| 5 |
+
this is stupid,feedback
|
| 6 |
+
I'm done,exit
|
| 7 |
+
quit,exit
|
| 8 |
+
I don't know,hint
|
| 9 |
+
help,hint
|
| 10 |
+
can I do something else?,main menu
|
| 11 |
+
what's going on,rapport
|
| 12 |
+
what's going on,main menu
|
| 13 |
+
tell me a joke,rapport
|
| 14 |
+
tell me a joke,main menu
|
| 15 |
+
Sorry I don't understand,do not know
|
| 16 |
+
Ten thousand,number
|
| 17 |
+
1.234,number
|
| 18 |
+
"10,000",number
|
| 19 |
+
"123, 456",numbers
|
| 20 |
+
"11, 12, 13",numbers
|
| 21 |
+
"100, 200, 300",numbers
|
| 22 |
+
"100, 200",numbers
|
| 23 |
+
Stop for a minute,wait
|
| 24 |
+
Bye bye,exit
|
| 25 |
+
Good night,exit
|
| 26 |
+
Am done,exit
|
| 27 |
+
Yes,yes
|
| 28 |
+
Help,help
|
| 29 |
+
Idiot,harder
|
| 30 |
+
Stop,exit
|
| 31 |
+
I don't get it,hint
|
| 32 |
+
Math,main menu
|
| 33 |
+
Math,math topic
|
| 34 |
+
Tomorrow let do math,wait
|
| 35 |
+
Later,wait
|
| 36 |
+
Pls i will continue pls,skip
|
| 37 |
+
Rori tell me now,help
|
| 38 |
+
harder,skip
|
| 39 |
+
Stop for now i wont to go to School,exit
|
| 40 |
+
Next,next
|
| 41 |
+
Okay,okay
|
| 42 |
+
Great,affirmation
|
| 43 |
+
Give me for example,example
|
| 44 |
+
No I want to learn algebraic expressions,algebra
|
| 45 |
+
Hi rori,greeting
|
| 46 |
+
*help*,help
|
| 47 |
+
*Next*,next
|
| 48 |
+
Okay nice,okay
|
| 49 |
+
I don't know it,hint
|
| 50 |
+
Nex,next
|
| 51 |
+
I need a help,hint
|
| 52 |
+
Please can I ask your any math questions?,faq
|
| 53 |
+
The answer is 1,answer
|
| 54 |
+
The answer is 1,number
|
| 55 |
+
But 0.8 is also same as . 8 so I was actually right,I'm right
|
| 56 |
+
What is the number system?,faq
|
| 57 |
+
Ok thanks,thanks
|
| 58 |
+
I'm going to school now,exit
|
| 59 |
+
Let's move to another topic,main menu
|
| 60 |
+
"Ummanni saba
|
| 61 |
+
Kebena bara kana galmi keenya inni guddaan bilisummaa qofa #Gabrummaan_ammaan booda_gaha namni hundi bakka jiru irraa kutatee ka,ee jira obboleewwan goototni keenya jiran haqa Kebenaaf jechaa jiru Guraandhala 29 booda walabummaa keenya labsina Dhugaa qabna Ni injifanna *** . Naannoo giddu galeessa Itoophiyaatti #Kebenaan aanaa addaati Kun murtoo ummata Kebenaa hundaati",spam
|
| 62 |
+
Yes it,yes
|
| 63 |
+
U type fast,too fast
|
| 64 |
+
I mean your typing is fast,too fast
|
| 65 |
+
Why do u type so fast,too fast
|
| 66 |
+
Ur typing is fast,too fast
|
| 67 |
+
Can we go to a real work,harder
|
| 68 |
+
I know all this,harder
|
| 69 |
+
Answer this,preamble
|
| 70 |
+
Am tired,exit
|
| 71 |
+
This is not what I asked for,main menu
|
| 72 |
+
Bye,exit
|
| 73 |
+
😱😱😂😂😂😡😰😰😰😒,spam
|
| 74 |
+
Gbxbxbcbcbbcbchcbchc,spam
|
| 75 |
+
I want to solve math,math topic
|
| 76 |
+
Pleas let start with the fraction,fractions topic
|
| 77 |
+
Okey,okay
|
| 78 |
+
i need substraction,subtraction topic
|
| 79 |
+
Can you please stop with me,exit
|
| 80 |
+
Another one,next
|
| 81 |
+
Harder or easy,main menu
|
| 82 |
+
Hard or easier,main menu
|
| 83 |
+
Jump topic,menu
|
| 84 |
+
Got it,okay
|
| 85 |
+
I didn't understand,don't know
|
| 86 |
+
Don't understand,don't know
|
| 87 |
+
Excuse me pls,hint
|
| 88 |
+
Let stop for today,exit
|
| 89 |
+
Help and stop asking me stupid questions,
|
| 90 |
+
Ykay,okay
|
| 91 |
+
Not interested in solving this,menu
|
| 92 |
+
Stpo,exit
|
| 93 |
+
Hiiiiiii,greeting
|
| 94 |
+
Hi rori,greeting
|
| 95 |
+
I've done this things before,harder
|
| 96 |
+
Which number my phone number,
|
| 97 |
+
Unit,main menu
|
| 98 |
+
No ide,don't know
|
| 99 |
+
No ide,hint
|
| 100 |
+
No idea,don't know
|
| 101 |
+
🙈🤩😇🙏,spam
|
| 102 |
+
Thank u,thanks
|
| 103 |
+
Do you know programming,faq
|
| 104 |
+
Delete my number,unsubscribe
|
| 105 |
+
See u,exit
|
| 106 |
+
Can I go for break ??,wait
|
| 107 |
+
I wanna fuck,profanity
|
| 108 |
+
Enough of this nw,exit
|
| 109 |
+
Can we move to equations,equations
|
| 110 |
+
Do you know you are an idiot,insult
|
| 111 |
+
3 digit number,number
|
| 112 |
+
3 digit number,answer
|
| 113 |
+
Three digit number,confident answer
|
| 114 |
+
Three digit number,number
|
| 115 |
+
Good evening Rori,greeting
|
| 116 |
+
89 Next,answer
|
| 117 |
+
89 Next,number
|
| 118 |
+
3 digit number,answer
|
| 119 |
+
Three digit number,answer
|
| 120 |
+
This is too simple,harder
|
| 121 |
+
Am not a kid,harder
|
| 122 |
+
Hey Miss Roribcan you ask me some question from Secondary 2,greeting
|
| 123 |
+
Hey Miss Roribcan you ask me some question from Secondary 2,faq
|
| 124 |
+
Hey Miss Roribcan you ask me some question from Secondary 2,main menu
|
| 125 |
+
don't know,hint
|
| 126 |
+
don't know,easier
|
| 127 |
+
𝑴𝒂𝒕𝒉,math
|
| 128 |
+
Rori can you help me to gat value,
|
| 129 |
+
I called but u are not picking up,
|
| 130 |
+
0.3 answer,answer
|
| 131 |
+
Sorry rori was101,answer
|
| 132 |
+
Y is it 6,answer
|
| 133 |
+
Y is it 6,number
|
| 134 |
+
0.3 answer,number
|
| 135 |
+
Why 0.5,more explanation
|
| 136 |
+
Why 0.5,number
|
| 137 |
+
6\nNext,Next
|
| 138 |
+
How is the answer is 11,more explanation
|
| 139 |
+
How comes we have 11,more explanation
|
| 140 |
+
Yes 6,answer
|
| 141 |
+
Yes 6,number
|
| 142 |
+
6\nNext,number
|
| 143 |
+
How is the answer is 11,number
|
| 144 |
+
How comes we have 11,number
|
mathtext_fastapi/intent_classification.py
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import numpy as np
|
| 2 |
+
import pandas as pd
|
| 3 |
+
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
from sentence_transformers import SentenceTransformer
|
| 6 |
+
from sklearn.linear_model import LogisticRegression
|
| 7 |
+
from joblib import dump, load
|
| 8 |
+
|
| 9 |
+
def pickle_model(model):
|
| 10 |
+
DATA_DIR = Path(__file__).parent.parent / "mathtext_fastapi" / "data" / "intent_classification_model.joblib"
|
| 11 |
+
dump(model, DATA_DIR)
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
def create_intent_classification_model():
|
| 15 |
+
encoder = SentenceTransformer('all-MiniLM-L6-v2')
|
| 16 |
+
# path = list(Path.cwd().glob('*.csv'))
|
| 17 |
+
DATA_DIR = Path(__file__).parent.parent / "mathtext_fastapi" / "data" / "labeled_data.csv"
|
| 18 |
+
|
| 19 |
+
print("DATA_DIR")
|
| 20 |
+
print(f"{DATA_DIR}")
|
| 21 |
+
|
| 22 |
+
with open(f"{DATA_DIR}",'r', newline='', encoding='utf-8') as f:
|
| 23 |
+
df = pd.read_csv(f)
|
| 24 |
+
df = df[df.columns[:2]]
|
| 25 |
+
df = df.dropna()
|
| 26 |
+
X_explore = np.array([list(encoder.encode(x)) for x in df['Utterance']])
|
| 27 |
+
X = np.array([list(encoder.encode(x)) for x in df['Utterance']])
|
| 28 |
+
y = df['Label']
|
| 29 |
+
model = LogisticRegression(class_weight='balanced')
|
| 30 |
+
model.fit(X, y, sample_weight=None)
|
| 31 |
+
|
| 32 |
+
print("MODEL")
|
| 33 |
+
print(model)
|
| 34 |
+
|
| 35 |
+
pickle_model(model)
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
def retrieve_intent_classification_model():
|
| 39 |
+
DATA_DIR = Path(__file__).parent.parent / "mathtext_fastapi" / "data" / "intent_classification_model.joblib"
|
| 40 |
+
model = load(DATA_DIR)
|
| 41 |
+
return model
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
def predict_message_intent(message):
|
| 45 |
+
encoder = SentenceTransformer('all-MiniLM-L6-v2')
|
| 46 |
+
model = retrieve_intent_classification_model()
|
| 47 |
+
tokenized_utterance = np.array([list(encoder.encode(message))])
|
| 48 |
+
predicted_label = model.predict(tokenized_utterance)
|
| 49 |
+
predicted_probabilities = model.predict_proba(tokenized_utterance)
|
| 50 |
+
confidence_score = predicted_probabilities.max()
|
| 51 |
+
|
| 52 |
+
return {"type": "intent", "data": predicted_label[0], "confidence": confidence_score}
|
mathtext_fastapi/nlu.py
CHANGED
|
@@ -2,6 +2,7 @@ from fuzzywuzzy import fuzz
|
|
| 2 |
from mathtext_fastapi.logging import prepare_message_data_for_logging
|
| 3 |
from mathtext.sentiment import sentiment
|
| 4 |
from mathtext.text2int import text2int
|
|
|
|
| 5 |
import re
|
| 6 |
|
| 7 |
|
|
@@ -142,6 +143,7 @@ def evaluate_message_with_nlu(message_data):
|
|
| 142 |
}
|
| 143 |
message_text = message_data['message_body']
|
| 144 |
|
|
|
|
| 145 |
intent_api_response = run_intent_classification(message_text)
|
| 146 |
if intent_api_response['data']:
|
| 147 |
return intent_api_response
|
|
@@ -149,6 +151,13 @@ def evaluate_message_with_nlu(message_data):
|
|
| 149 |
number_api_resp = text2int(message_text.lower())
|
| 150 |
|
| 151 |
if number_api_resp == 32202:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
sentiment_api_resp = sentiment(message_text)
|
| 153 |
nlu_response = build_nlu_response_object(
|
| 154 |
'sentiment',
|
|
|
|
| 2 |
from mathtext_fastapi.logging import prepare_message_data_for_logging
|
| 3 |
from mathtext.sentiment import sentiment
|
| 4 |
from mathtext.text2int import text2int
|
| 5 |
+
from mathtext_fastapi.intent_classification import create_intent_classification_model, retrieve_intent_classification_model, predict_message_intent
|
| 6 |
import re
|
| 7 |
|
| 8 |
|
|
|
|
| 143 |
}
|
| 144 |
message_text = message_data['message_body']
|
| 145 |
|
| 146 |
+
# Run intent classification only for keywords
|
| 147 |
intent_api_response = run_intent_classification(message_text)
|
| 148 |
if intent_api_response['data']:
|
| 149 |
return intent_api_response
|
|
|
|
| 151 |
number_api_resp = text2int(message_text.lower())
|
| 152 |
|
| 153 |
if number_api_resp == 32202:
|
| 154 |
+
# Run intent classification with logistic regression model
|
| 155 |
+
predicted_label = predict_message_intent(message_text)
|
| 156 |
+
if predicted_label['confidence'] > 0.01:
|
| 157 |
+
nlu_response = predicted_label
|
| 158 |
+
return nlu_response
|
| 159 |
+
|
| 160 |
+
# Run sentiment analysis
|
| 161 |
sentiment_api_resp = sentiment(message_text)
|
| 162 |
nlu_response = build_nlu_response_object(
|
| 163 |
'sentiment',
|
requirements.txt
CHANGED
|
@@ -8,6 +8,7 @@ pydantic==1.10.*
|
|
| 8 |
python-Levenshtein
|
| 9 |
requests==2.27.*
|
| 10 |
sentencepiece==0.1.*
|
|
|
|
| 11 |
supabase
|
| 12 |
transitions
|
| 13 |
uvicorn==0.17.*
|
|
|
|
| 8 |
python-Levenshtein
|
| 9 |
requests==2.27.*
|
| 10 |
sentencepiece==0.1.*
|
| 11 |
+
sentence-transformers
|
| 12 |
supabase
|
| 13 |
transitions
|
| 14 |
uvicorn==0.17.*
|
scripts/make_request.py
CHANGED
|
@@ -58,22 +58,23 @@ def run_simulated_request(endpoint, sample_answer, context=None):
|
|
| 58 |
print(request)
|
| 59 |
|
| 60 |
|
| 61 |
-
run_simulated_request('intent-classification', 'exit')
|
| 62 |
-
run_simulated_request('
|
| 63 |
-
run_simulated_request('
|
| 64 |
-
run_simulated_request('
|
| 65 |
-
run_simulated_request('nlu', '
|
| 66 |
-
run_simulated_request('nlu', '
|
| 67 |
-
run_simulated_request('nlu', '
|
| 68 |
-
run_simulated_request('nlu', '
|
| 69 |
-
run_simulated_request('nlu', '
|
| 70 |
-
run_simulated_request('nlu', '8')
|
|
|
|
| 71 |
run_simulated_request('nlu', "I don't know")
|
| 72 |
-
run_simulated_request('nlu', "I don't know eight")
|
| 73 |
-
run_simulated_request('nlu', "I don't 9")
|
| 74 |
-
run_simulated_request('nlu', "0.2")
|
| 75 |
-
run_simulated_request('nlu', 'Today is a wonderful day')
|
| 76 |
-
run_simulated_request('nlu', 'IDK 5?')
|
| 77 |
# run_simulated_request('manager', '')
|
| 78 |
# run_simulated_request('manager', 'add')
|
| 79 |
# run_simulated_request('manager', 'subtract')
|
|
|
|
| 58 |
print(request)
|
| 59 |
|
| 60 |
|
| 61 |
+
# run_simulated_request('intent-classification', 'exit')
|
| 62 |
+
# run_simulated_request('intent-classification', "I'm not sure")
|
| 63 |
+
# run_simulated_request('sentiment-analysis', 'I reject it')
|
| 64 |
+
# run_simulated_request('text2int', 'seven thousand nine hundred fifty seven')
|
| 65 |
+
# run_simulated_request('nlu', 'test message')
|
| 66 |
+
# run_simulated_request('nlu', 'eight')
|
| 67 |
+
# run_simulated_request('nlu', 'is it 8')
|
| 68 |
+
# run_simulated_request('nlu', 'can I know how its 0.5')
|
| 69 |
+
# run_simulated_request('nlu', 'eight, nine, ten')
|
| 70 |
+
# run_simulated_request('nlu', '8, 9, 10')
|
| 71 |
+
# run_simulated_request('nlu', '8')
|
| 72 |
run_simulated_request('nlu', "I don't know")
|
| 73 |
+
# run_simulated_request('nlu', "I don't know eight")
|
| 74 |
+
# run_simulated_request('nlu', "I don't 9")
|
| 75 |
+
# run_simulated_request('nlu', "0.2")
|
| 76 |
+
# run_simulated_request('nlu', 'Today is a wonderful day')
|
| 77 |
+
# run_simulated_request('nlu', 'IDK 5?')
|
| 78 |
# run_simulated_request('manager', '')
|
| 79 |
# run_simulated_request('manager', 'add')
|
| 80 |
# run_simulated_request('manager', 'subtract')
|