Modeling Domestic Political Crises via a Bayesian Predictive Model
Yaser Keneshloo & Antuan Byalik Spring 2014 ECE 6504 Probabilistic Graphical Models: Class Project Virginia Tech
Goal
The goal of this project was to build a prediction model for predicting political crises in some
country of interest based on previous internal interactions of that country.
The concept of a political crises is defined by some set of interactions between particular users.
We used the GDELT open source dataset to test and train our model. This dataset contains geo-located
events with global coverage from 1979 to the present. It contains four categories of classification
described through either verbal or material cooperation (represented with 1-10) and verbal or material
conflict (represented with 11-20). For each of these classifications there are 32 different actors that
can interact. The actors range from government and political to police or civilian. This dataset provided
a solid foundation to construct a Bayesian model. Given the various interactions between actors modelled
as a graph we were able to create the line-graph of user to user interactions where each user is the
connection between two actors and their respective weight (classification integer). The end result was
to use a prediction model to generate this line-graph at the next time step given all previous time steps.
This line-graph is then compared to existing graphs of domestic political crises to output a final prediction
about a crises happening in this country.
Approach
The approach we took was to follow the forecasting equation shown above. We attempted to predict the current
time step line-graph given all previous line-graphs known. The H matrix shown in the equation is the user-feature
matrix that is of size number of users(496) by features(20). The number of users is 32 possible actors multiplied by itself
for all possible pairwise connections (1024). Then remove the actor to itself mapping (992) and lastly divide by 2 as
it is an undirected graph and thus symmetric (496). The 20 features come from the 20 possible integer representations
of the actor interaction. The Y matrix is user by user and represents the adjacency matrix for the line-graph (hence what
we want to predict). Thus from the equation we can see that the combination of current and past feature matrices with all
past user adjacency matrices (by the chain rule) is enough to produce the current user adjacency matrix. We used MATLAB to
build each of the supporting matrices needed and attempted to run the full double summation. However, this was quickly
discovered to be largely too computationally expensive and so we resorted to MCMC sampling. This still took a long amount
of time to run but produced results that are discussed in the next section.
Results
For this project, we tried to run our model on a training data containing 158 months of data. Each month represents the line graph of interaction between actors in Argentina. Each graph contains 496 nodes and 20 features. We followed the same approach that explained in Nonparametric latent feature models for link prediction(LFRM) paper for the MCMC sampling. The problem with the MCMC approach that explained in this paper is that it works poorly on network with more than 20 nodes. Therefore, it took about one day and a half for us to run the MCMC algorithm only for 10 months of data using 200 burn-in iterations for the sampling. We used the network in the 11th month as our test data and the following is a comparison of our test data using the approach that explained in the paper and DRIFT algorithm as the baseline method. We used the log likelihood measure to compare the two approaches.
Log likelihood for the next timestep (11th month):
----------------------------------------------
LFRM: -166.7336
DRIFT: -223.9889
True Z_test Oracle: -16126.29
True Z_train Oracle: -16111.27
A higher value of log-likelihood means a better prediction since it compares the likelihood of the predicted network and the ground-truth network.
As you can see, as expected and reported in the paper, DRIFT works worse than the LFRM in terms of the log-likelihood. There is a new method called LFP which outperforms both of these approaches and it might be a better approach to follow in the future projects.
Having this done, we will have an interaction network and by comparing the structure of the predicted graph with the ground-truth data, we will be able to classify this graph as a graph that causes DPC or a normal graph. For this, we can use any graph similarity measures, like graph kernels, to find the closest class to the predicted graph. Moreover, one may apply a graph clustering algorithm over all the data and then tries to find the closest cluster to this graph.
The code we wrote for this project is available here: Code Download