Customer Conversion Prediction with Markov Chain Classifier Bussiness Requirement: For online users, conversion generally refers to the user action that results in some tangible gain for a business e.g., a user opening an account or a user making his or her first purchase. Next to drawing a large number of users to a website, getting a user to convert is the most critical event in a user’s relationship with an online business. Being able to predict when a user will convert to become a customer should be an important tool that online businesses should have at their disposal. A business could initiate a targeted marketing campaign based on the prediction result. There are many relevant attributes for the web session. We will be considering only the following as part of the demo. 1. Time elapsed since the last visit 2. Time spent in the session To keep the state transition matrix manageable, we will discretize the attributes into 3 levels; High, Medium, and Low. With two attributes, we will end up with 9 states in our problem. Each session will be characterized by two symbols, which stand for a state. For example, HM will imply that time elapsed since the last session is high and time spent in the current session is medium. Here is some sample input data: 4F014156K07N,LL,ML,HH,HL,LL,HM,HL,LH,ML,HH,HL,LH G7C0M9H5SUZ1,HL,LM,HL,MH,HH,HH,ML,HL GWBX875AD31D,LL,HM,HL,HL,HM KRO2F24JUDE5,HL,HM,HM,HL,HM,MH,HM,HL,HL 3J0G4BB9BI1Q,LM,LH,LH,MH,LM,MH,LH Here is the output for the above data: 4F014156K07N,F,LL,ML,HH,HL,LL,HM,HL,LH,ML,HH,HL,LH G7C0M9H5SUZ1,F,HL,LM,HL,MH,HH,HH,ML,HL GWBX875AD31D,F,LL,HM,HL,HL,HM KRO2F24JUDE5,T,HL,HM,HM,HL,HM,MH,HM,HL,HL 3J0G4BB9BI1Q,F,LM,LH,LH,MH,LM,MH,LH Each line in our output will consist of the following 1. Cookie ID (or User ID) 2. Class variable indicating whether the user converted or not (True or False) 3. Sequence of session data where each element of the sequence is a 2 alphabet symbol Setup ===== Install matumizi which is a package for data exploration and various other utilities pip3 install -i https://test.pypi.org/simple/ matumizi==0.0.3 Make sure you have the supv directory at the same level as your working directory containing visit_history.py mcclf_cc.properties Generate training data ====================== python3 visit_history.py --op gen --nuser 1000 --crate 10 --label true >> cc_tr.txt nuser = num of users crate = conversion rate label = whether class label should be created Train model =========== python3 visit_history.py --op train --mlfpath mcclf_cc.properties Generate prediction data ======================== python3 visit_history.py --op gen --nuser 100 --crate 10 --label false >> cc_pr.txt Predict ======= python3 visit_history.py --op pred --mlfpath mcclf_cc.properties