Pose estimation using a hill-climbing structure learning approach
Ujwal Krothapalli Spring 2014 ECE 6504 Probabilistic Graphical Models: Class Project Virginia Tech
Goal
This project will use structure learning techniques like chow-liu tree and inference algorithms like loopy
-belief propagation to the human pose estimation problem. Instead of using a 'grammar' which is explicitly encoded in current pose
estimation methods to learn the structure of the skeleton using a minimum spanning tree.
Approach
The implementation of Yang and Ramanan for pose estimation was used. After obtaining the x and y coordinates for the training images, a minimum spanning tree was computed.
This project will explore the use of the chow-liu tree algorithm to build an approximation of the tree structure that minimizes the Kullback-Leibler distance compared to the actual distribution of parts in part based human pose estimation algorithms. The feature extraction pipeline has been borrowed from the implementation of Yang and Ramanan. Histogram of Oriented Gradients are used to designate filters for each part.The explicit grammar that has been used to detect humans as skeletons is a tree structure that is based on the kinematic build-up of the human body. This tree may not be the optimal way to describe the distribution of the human parts in a given image.
The chow-liu tree for the 18 part BUFFY model showing the edges.
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
After obtaining the tree structure a loopy belief propagation algorithm was run. A potts potential model was used for the binary tree structure.
The marginal beliefs obtained were,
0.9000 0.1000
0.8200 0.1800
0.7560 0.2440
0.6638 0.3362
0.6311 0.3689
0.6049 0.3951
0.5839 0.4161
0.7048 0.2952
0.6638 0.3362
0.6311 0.3689
0.7560 0.2440
0.6638 0.3362
0.6311 0.3689
0.6049 0.3951
0.5839 0.4161
0.7048 0.2952
0.6638 0.3362
0.6311 0.3689
The pairwise beliefs are listed below.
Edgebelief(:,:,1) =
0.8100 0.0900
0.0100 0.0900
Edgebelief(:,:,2) =
0.7380 0.0820
0.0180 0.1620
Edgebelief(:,:,3) =
0.7380 0.0820
0.0180 0.1620
Edgebelief(:,:,4) =
0.6804 0.0756
0.0244 0.2196
Edgebelief(:,:,5) =
0.5975 0.0664
0.0336 0.3025
Edgebelief(:,:,6) =
0.6343 0.0295
0.0705 0.2657
Edgebelief(:,:,7) =
0.5680 0.0631
0.0369 0.3320
Edgebelief(:,:,8) =
0.5444 0.0605
0.0395 0.3556
Edgebelief(:,:,9) =
0.6343 0.0705
0.0295 0.2657
Edgebelief(:,:,10) =
0.5975 0.0664
0.0336 0.3025
Edgebelief(:,:,11) =
0.6804 0.0756
0.0244 0.2196
Edgebelief(:,:,12) =
0.5975 0.0664
0.0336 0.3025
Edgebelief(:,:,13) =
0.6343 0.0295
0.0705 0.2657
Edgebelief(:,:,14) =
0.5680 0.0631
0.0369 0.3320
Edgebelief(:,:,15) =
0.5444 0.0605
0.0395 0.3556
Edgebelief(:,:,16) =
0.6343 0.0705
0.0295 0.2657
Edgebelief(:,:,17) =
0.5975 0.0664
0.0336 0.3025
And the free energy was,
logZ = -8.8818e-16
Results
The new tree structure produced an AP of 76 as opposed the original AP of 78.5, but the wrist accuracy improved by 8%.
AP values for Head, Shoulders, Elbows, Wrists and Hips for the chow-liu tree
85.7
86.2
79.4
67.3
61.1
AP values for Head, Shoulders, Elbows, Wrists and Hips with the original tree