ECE 5554 / ECE 4554: Computer Vision Fall 2016
Instructions
Due date: 11:59pm on Wednesday, November 2nd, 2016
Handin: through Canvas
Starter code and data: hw4.zip
Deliverables:
Please prepare your answer sheet using the filename of FirstName_LastName_HW4.pdf.
Include all the code you implement in each problem. Remember to set relative paths so that we can run your code out of the box.
Compress the code and answer sheet into FirstName_LastName_HW4.zip before uploading it to canvas.
Part 1: SLIC Superpixels (35 points)
Overview
Superpixel algorithms group pixels into perceptually meaningful regions while respecting
potential object contours, and thereby can replace the rigid pixel grid structure. Due to the reduced
complexity, superpixels are becoming popular for various computer vision applications, e.g.,
multiclass object segmentation, depth estimation, human pose estimation, and object localization.
In this problem, you will implement a simple superpixel algorithm called Simple Linear Iterative
Clustering (SLIC) that clusters pixels in the fivedimensional color and pixel coordinate space (e.g.,
\(r, g, b, x, y\)). The algorithm starts with a collection of K cluster centers initialized at an equally
sampled regular grid on the image of N pixels. For each cluster, you define for a localized window
2S x 2S centered at the cluster center, where \(S = \sqrt{N/K}\) is the roughly the space between the
seed cluster centers. Then, you check whether the pixel within the 2S x 2S local window should
be assigned to the cluster center or not (by comparing the distance in 5D space to the cluster center).
Once you loop through all the clusters, you can update the cluster center by averaging over the
cluster members. Iterate the pixeltocluster assignment process till convergence or maximum iterations reached.
For more details, please see SLIC Superpixels Compared to Stateoftheart Superpixel Methods,
PAMI 2012.
Writeup
(5 points) (a) Explain your distance function for measuring the similarity between a pixel and cluster in the 5D space.
(5 points) (b) Choose one image, try three different weights on the color and spatial feature and show the three segmentation results. Describe what you observe.
(5 points) (c) Choose one image, show the error map (1) at the initialization and (2) at convergence.
Note: error  distance to cluster center in the 5D space.
(5 points) (d) Choose one image, show three superpixel results with different number of \(K\), e.g., \(64,
256, 1024\) and run time for each \(K\) (use tic, toc)
(15 points) (e) Run your algorithms on the subset (50 images) of Berkeley Segmentation Dataset (BSD) and
report two types metrics (1) boundary recall and (2) undersegmentation error with \(K =
64, 256, 1024\). Also, report averaged run time per image for the BSD
Hint
Part 2: EM Algorithm: Dealing with Bad Annotations (35 points)
Dealing with noisy annotations is a common problem in computer vision, especially when using crowd
sourcing tools, like Amazon's Mechanical Turk. For this problem, you've collected photo aesthetic
ratings for 150 images. Each image is labeled 5 times by a total of 25 annotators (each annotator
provided 30 labels). Each label consists of a continuous score from 0 (unattractive) to 10 (attractive).
The problem is that some users do not understand instructions or are trying to get paid without
attending to the image. These “bad” annotators assign a label uniformly at random from 0 to 10.
Other “good” annotators assign a label to the \(i^{th}\) image with mean \(\mu\) and standard deviation \(\sigma\)
(\(\sigma\) is the same for all images). Your goal is to solve for the most likely image scores and to figure out
which annotators are trying to cheat you.
In your writeup, use the following notation:
\(x_{ij} \in [0, 10]\): the score for \(i^{th}\) image from the \(j^{th}\) annotator
\(m_j \in \{0, 1\}\): whether each \(j^{th}\) annotator is “good” (\(m_j = 1\)) or “bad” (\(m_j = 0\))
\(P(x_{ij}m_j = 0) = \frac{1}{10}\): uniform distribution for bad annotators
\(P(x_{ij}m_j = 1; \mu_i, \sigma) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp(\frac{1}{2} \frac{(x_{ij}\mu_i)^2}{\sigma^2})\): normal distribution for good annotators
\(P(m_j = 1; \beta ) = \beta\): prior probability for being a good annotator
Derivation of EM Algorithm (20 points)
Derive the EM algorithm to solve for each \(\mu_i\), each \(m_j\), \(\sigma\), and \(\beta\).
Show the major steps of the derivation and make it clear how to compute each variable in the update step.
Application to Data (15 points)
You can find the annotation_data.mat " file containing
annotator_ids(i) provides the ID of the annotator for the \(i^{th}\) annotation entry
image_ids(i) provides the ID of the image for the \(i^{th}\) annotation entry
annotation_scores(i) provides the \(i^{th}\) annotated score
Implement the EM algorithm to solve for all variables. You will need to initialize the probability that
each annotator is good { set the initial value to 0.5 for each annotator.
Writeup
(a) Report the indices corresponding to “bad” annotators (\(j\) for which \(m_j\) is most likely \(0\))
(b) Report the estimated value of \(\sigma\)
(c) Plot the estimated means \(\mu_i\) of the image scores (e.g., plot(1:150, mean scores(1:150)))
Hint
For numerical stability, you should use logsumexp.m when computing the
likelihood that an annotator is “good”, given the current parameters. The joint probability of
each annotators scores and “goodness” will be very small (close to zero by numerical precision).
Thus, you should compute the log probabilities and use logsumexp to compute the denominator.
Another useful functions is normpdf
In each iteration, show the bar plot of probability of “good” for each annotator and show the plot of the mean score for each image. At the end, you should be very confident which
annotators are “good”.
Part 3: Graphcut Segmentation (30 points)
Let us apply Graphcuts for foreground/background segmentation. In the “cat” image, you are given
a rough polygon of a foreground cat. Apply Graphcuts to get a better segmentation.
First, you need an energy function. Your energy function should include a unary term, a data
independent smoothing term, and a contrastsensitive smoothing term. Your unary terms should
be \(\log[(\frac{P(\mathrm{pixel}\mathrm{foreground})}{ P(\mathrm{pixel}\mathrm{background})}]\).
Your pairwise term should include uniform smoothing and the contrast
sensitive term. To construct the unary term, use the provided polygon to obtain an estimate of
foreground and background color likelihood. You may choose the likelihood distribution (e.g., color
histograms or color mixture of Gaussians.).
Apply graph cut code for segmentation. You must define the graph structure and unary and pairwise
terms and use the provided graph cut code or another package of your choice.
Writeup
(5 points) (a) Explain your foreground and background likelihood function.
(5 points) (b) Write unary and pairwise term as well as the whole energy function, in math expressions.
(10 points) (c) Your foreground and background likelihood map. Display \(P(\mathrm{foreground}  \mathrm{pixel})\) as an intensity
map (bright = confident foreground).
(10 points) (d) Final segmentation. Create an image for which the background pixels are blue, and the foreground
pixels have the color of the input image.
Hint
Extra credit / graduate credit (max possible 30 points extra credit)
SLIC Superpixels
up to 15 points: Try to improve your result part 1(e). You may try different
color space (e.g., CIELab, HSV) (See Sec 4.5 in the paper),
richer image features (e.g., gradients) or any other ideas you come up with. Report the
accuracy on boundary recall and undersegmentation error with K = 256. Compare the
results with part 1(e) and explain why you get better results.
up to 15 points: Implement AdaptiveSLIC (the parameterfree version of SLIC). See Sec 4.5.E in the paper for more details.
Test your AdaptiveSLIC implementation on two images and compare the results with SLIC.
Graphcut Segmentation
In your answer sheet, describe the extra points under a separate heading.
Acknowledgements
This homework is adapted from the projects developed by Derek Hoiem (UIUC)
