Parallel Matrix Factorization for Recommender Systems using Social Network Information

Yao Zhang, Liangzhe Chen
Spring 2014 ECE 6504 Probabilistic Graphical Models: Class Project
Virginia Tech

Goal

The goal of our project is to integrate social network information with the classic matrix factorization to build a recommender system which predicts the preference of a user more accurately.

The basic of our project is the recommender system. Companies like Amazon, Hulu, use recommender systems to predict their customers preference towards certain product based on their previous purchase, and recommend products with high value to customers. The more accurate the system can predict customers' preference, the more profit it brings.

One of the major approach to build a recommender system is matrix factorization. While this method has been widely used and studied, we haven't seen any matrix factorization methods taking into account the social network information of users. And this social network information they miss may be helpful to make better predictions. A person's preference is likely to be similar to those of their close friends'. Knowing the friendship relation between users can improve the current matrix factorization methods. That's where we drew our idea.

Approach

We first look at the classic matrix factorization. The dataset we have is a n by d matrix R, n is the number of users and d is the number of items. R(i,j) is the rating for item j from user i. In this matrix, typically there are many missing data because every user only rate very limited number of items. The task here is to reconstruct and complete the matrix so the value we fill in these missing part is consistent with the other data points already in the matrix.

The intuition behind matrix factorization is that there should be some latent rules by which users rate item, and we capture this latent rules by factorizing the original data matrix to two parts, one represents how users are connected to certain rules, and the other represents how certain rules are connected to items. As shown in the following picture:

Now we want to estimate the value of matrix U and matrix V, such that the error between the product of these two matrix and the original dataset matrix is minimized. Following is the optimization problem for matrix factorization:

Usually this people add regularization of the elements in U and V to avoid overfitting. This optimization problem can be solved by stochastic gradient descent. Taking derivative of the objective function, setting a learning rate, and update the objective function iteratively until it's getting stable. The updating rule for U and V is shown below:

Now to add social network information into matrix factorization, we need to adjust the objective function accordingly. Besides minimizing the error function, we also want to minimize the dissimilarity of preferences from users who are friends with each other. The optimization problem now have the following form:

The good part of this optimization problem is, the second term we add is still in quadratic form, we can still use gradient descent to calculate the optimal solution. The down side here, however, is that the summation in the second term would take cube of n to run. This is not scalable where n is usually large in our dataset and we are using iterative method to solve the problem. So we need to parallelize the optimizing process to decrease the total time needed to find the solution. The approach we use here is to divide the orginal dataset matrix R to k by k blocks, and also divide U and V to corresponding K blocks. Each block in R can be estimated by the product of the corresponding two blocks in U and V, and it doesn't depend on any other blocks. Hence we can parallely solve subproblems in each block and finally combine these blocks to form our final result. Note that we haven't come up with a good way to merge results from diffirent blocks, so right now we are only using the blocks in the diagonal of R to avoid the merging step.

Results

The dataset we use for our problem is from Movielens. It contains 1,000 users, 1,700 movies, and 100k rates. This dataset can be downloaded from this link: http://grouplens.org/datasets/movielens/. In this dataset, we don't have friendship relation between users, so we build a social network between users using the similarity of their rating. If the ratings from two users are similar then we regard them as friends.

The parameter setting for our experiment is: learning rate 0.01, number of latent rules is 20 (which is part of the size of U and V), 10 blocks when parallelizing. All codes are implemented in python, we use python multi-threads to simulate the parallelization.

We use mean squre error (MSE) to measure the performance. Lower is better. We first ran our algorithms on the whole dataset, and calculated the MSE (Training Error). Then we divided the dataset into two parts: 80% as training data and 20% as testing data, and calculated the MSE (Testing Error).

The left figure is the performance of serial matrix factorization, while the right figure is the performance of parallel matrix factorization.

Here's some observation from our results. First, for both serial and parallel matrix factorization, the training errors are always lower than the testing errors. Second, the serial matrix factorization is better than the parallel version. This is because we may loss some information using the current parallel algorithm (we only consider blocks in the diagonal of the original matrix). Third, as the regularzition factor increases, for the serial algorithm, the testing error first becomes smaller, then increases, which means our regularization works. Also, the best result we get from serial version of our algorithm has RMSE=0.94, and MSE=0.88.

Here's the running time of our algorithm:
Serial Matrix Factorization: running time: 19.90s
Parallel Matrix Factorization: runnint time: 2.19s
Parallel Matrix Factorization algorithm is nearly ten times faster than serial matrix factorization algorithm.