Greedy Optimization of Intersection-over-Union Accuracy

Prakriti Banik
Spring 2014 ECE 6504 Probabilistic Graphical Models: Class Project
Virginia Tech


In vision problems like category label segmentation we predict structured objects. In structured prediction, we are interested in the prediction of a structured label, where the input is a complex object. Evaluation measures for such structured prediction problems are often task-specific. Intersection-over-Union (IOU) accuracy is one such well known evaluation measures (used by PASCAL VOC) which fixes background bias. Discriminative machine learning tries to train a system to optimize such specific measure of performance or loss.

In this project, the goal is to optimize this evaluation measure in a greedy fashion to maximize performance. The algorithm takes as input a model that produces properly calibrated probabilities and makes predictions that seek to maximize expected score on the test set with a greedy optimization. Optimization is done at corpus-level, as PASCAL VOC [
5] segmentation challenge uses corpus-level IOU accuracy. IOU accuracy is measured according to the following equation.

Where, * is the ground truth label and  is the predicted label for all superpixels, K is number of categories/classes.


We consider category label segmentation, which is a structured learning problem. Let G = (V, E) be a graph with nodes V and edges E defined over an output variable space , where each component of  takes one of K discrete values. DivMBest [1] algorithm reasons about such an exponentially large output space, makes a joint-prediction and outputs a diverse set of M plausible solutions. The steps of the approach are as follows

1.    Compute feature for each superpixel.

2.    Obtain a predictive distribution Q(), which fully factorizes using Softmax Regression.

3.    Take MAP solution of all images in the corpus.

4.    Compute surrogate gain [2] using Q using the following equation of

5.    For each k, among the superpixels not labeled as k, find the superpixel with highest probability to be label k in the corpus, and flip that superpixel to label k.

6.    Repeat until no more flips can improve the surrogate gain.


Features: In the proposed approach we take diverse M best solutions for each image and for each superpixel in G, we compute the histogram of K discrete labels assigned in all M solutions proposed by DivMBest algorithm and use this histogram as feature. An example image and its M=8 diverse best solutions are shown below.



Superpixels of an image is obtained by taking intersection of top 150 CPMC [3] segments. The superpixels of the above image is as follows.


We also experimented with 21 dimensional O2P [4] score along with the previously mentioned histogram as feature for each superpixel. The results are shown in the next section.


The IOU accuracy of 3-fold cross-validation vs. M DivMBest solutions plot for softmax regression on Pascal validation dataset is shown in figure (a). The feature used for this plot was both histogram and O2P score for each superpixel. The predictive distribution obtained here is used for optimizing the surrogate expected gain in the above mentioned greedy fashion. The IOU accuracy vs. Surrogate expected gain plot is shown in figure (b).

softmax            iou


From figure (b) we can observe that, MAP IOU accuracy is 45.1%. Initially the IOU accuracy goes up as surrogated expected gain goes up, but at iteration 17243, when the optimization algorithm converges, the IOU accuracy reaches around 43.9%. This is because the predictive distribution Q(), obtained from softmax regression does not seem to capture to the true distribution. A better feature representation might help to improve the predictive distribution, and the ultimate IOU accuracy.



1.      Batra, Dhruv, et al. "Diverse M-best solutions in Markov random fields." Computer Vision–ECCV 2012. Springer Berlin Heidelberg, 2012. 1-16.

2.      Tarlow, Daniel, and Ryan Prescott Adams. "Revisiting uncertainty in graph cut solutions." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

3.      Carreira, Joao, and Cristian Sminchisescu. "Constrained parametric min-cuts for automatic object segmentation." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.

4.    Carreira, Joao, et al. "Semantic segmentation with second-order pooling." Computer Vision–ECCV 2012. Springer Berlin Heidelberg, 2012. 430-443.

5.    Everingham, Mark, et al. "The pascal visual object classes (voc) challenge." International journal of computer vision 88.2 (2010): 303-338.

© Prakriti Banik

free hit counter