Michael Shuffett Spring 2014 ECE 6504 Probabilistic Graphical Models: Class Project Virginia Tech
Goal
Predict a purchased policy based on transaction history.
The basis of this project originates from the ongoing kaggle competition titled Allstate Purchase
Prediction Challenge. As a customer shops an insurance policy, he/she will receive a number of quotes with different coverage options before purchasing a plan. This is represented in this challenge as a series of rows that include a customer ID, information about the customer, information about the quoted policy, and the cost. The competition objective is to predict the purchased coverage options using a limited subset of the total interaction history. If the eventual purchase can be predicted sooner in the shopping window, the quoting process is shortened and the issuer is less likely to lose the customer's business.
Approach
Each of the categorical features was converted to a set of binary features. Additionally each variable in a customer's history 1...12 was converted to a unique variable with a dummy variable if the customer did not have that many history entries. This lead to 53,582 sparse features for each customer. The most relevant features were selected using Randomized PCA or Truncated SVD. Both produced similar results, but Truncated SVD's performance was better optimized for sparse matrices. I trained a linear SVM for each of the seven options that makes up a purchase. I also trained Random Forest with 100 trees as well as a few ad-hoc model that made simple predictions based on most recently viewed sales items.
Results
A prediction is only counted as correct if each of the above seven options are all predicted correctly. The following chart shows the accuracies of each model that was created including a comparison to the current leader of the competition.