Relative Attributes

 

Marr Prize (Best Paper Award) Winner, ICCV 2011

 

Devi Parikh and Kristen Grauman

   

(View this page's Romanian translation courtesy of azoft)

(View this page's Slovakian translation courtesy of Sciologness Team)

(View this page's Russian translation courtesy of Alexander Nikiforov)

 

  

“Who in the rainbow can draw the line where the violet tint ends and the orange tint begins? Distinctly we see the difference of the colors, but where exactly does the one first blendingly enter into the other? So with sanity and insanity.”
 
—Herman Melville, Billy Budd 

     

 

[paper]    [data]     [code]     [demos]    [slides]     [talk (video)]     [poster]

   


Abstract


Human-nameable visual "attributes" can benefit various recognition tasks. However, existing techniques restrict these properties to categorical labels (for example, a person is 'smiling' or not, a scene is 'dry' or not), and thus fail to capture more general semantic relationships. We propose to model relative attributes.  Given training data stating how object/scene categories relate according to different attributes, we learn a ranking function per attribute. The learned ranking functions predict the relative strength of each property in novel images. We then build a generative model over the joint space of attribute ranking outputs, and propose a novel form of zero-shot learning in which the supervisor relates the unseen object category to previously seen objects via attributes (for example, 'bears are furrier than giraffes').  We further show how the proposed relative attributes enable richer textual descriptions for new images, which in practice are more precise for human interpretation.  We demonstrate the approach on datasets of faces and natural scenes, and show its clear advantages over traditional binary attribute prediction for these new tasks. 
 

  


Motivation


Binary attributes are restrictive and can be unnatural. In the above examples, while one can characterize the image on the top-left and top-right as natural and man-made respectively, what would you describe the image in the top-center as? The only meaningful way to characterize it is with respect to the other images: it is less natural than the image on the left, but more so than the image on the right. 

 

  


Proposal


In this work, we propose to model relative attributes. As opposed to predicting the presence of an attribute, a relative attribute indicates the strength of an attribute in an image with respect to other images. In addition to being more natural, relative attributes offer a richer mode of communication, thus allowing access to more detailed human supervision (and so potentially higher recognition accuracy), as well as the ability to generate more informative descriptions of novel images. 

 

We devise an approach that learns a ranking function for each attribute, given relative similarity constraints on pairs of examples (or more generally a partial ordering on some examples). The learned ranking function can estimate a real-valued rank for images indicating the relative strength of the attribute presence in them. 

 

We introduce novel forms of zero-shot learning and image description that exploit the relative attribute predictions.

 

  


Approach


 

Learning relative attributes: Each relative attribute is learnt via a learning to rank formulation, given comparative supervision, as shown below:

 


  

 

Distinction between learning a wide-margin ranking function (right) that enforces the desired ordering on training points (1-6), and a wide-margin binary classifier (left) that only separates the two classes (+ and -), and does not necessarily preserve a desired ordering on the points is shown below:

 

 


  

 

 

 

Novel zero-shot learning: We study the following set-up 

We first train a set of relative attributes using the supervision provided on the seen categories. These attributes can also be pre-trained from external data. We then build a generative model (Gaussian) for each seen category using the responses of the relative attributes to the images from the seen categories. We then infer the parameters of the generative models of unseen caregories by utilizing their relative descriptions with respect to seen categories. A visualization of the simple approach we employ for this is shown below: 


 

 

 

 

A test image is assigned to the category with the maximum likelihood.

   

Automatically generating relative textual desriptions of images: Given an image I to be described, we evaluate all learnt ranking functions on I. For each attribute, we identify two reference images lying on either side of I, and are not too far from or too close to I. Image I is then described relative to these two reference images, as shown below:
 

 

  

As seen above, in addition to describing an image relative to other images, our approach can also describe an image relative to other categories, resulting in a purely textual description. Clearly, the relative descriptions are more precise and informative than the conventional binary description.

 


Experiments and Results


We conduct experiments on two datasets: 

(1) Outdoor Scene Recognition (OSR) containing 2688 images from 8 categories: coast C, forest F, highway H, inside-city I, mountain M, open-country O, street S and tall-building T. We use gist features to represent the images.

(2) A subset of the Public Figures Face Database (PubFig) containing 772 images from 8 categories: Alex Rodriguez A, Clive Owen C, Hugh Laurie H, Jared Leto J, Miley Cyrus M, Scarlett Johansson S, Viggo Mortensen V and Zac Efron Z. We use concatenated gist and and color features to represent the images.

 

The list of attributes used for each dataset, along with the binary and relative attribute annotations are shown below:

   


 

 

Zero-shot learning:

 

We compare our proposed approach to two baselines. The first is Score-based Relative Attributes (SRA). This baseline is the same as our approach, except it uses the scores of a binary classifier (binary attributes) instead of the scores of a ranking function. This baseline helps evaluate the need for a ranking function to best model relative attributes. Our second baseline is the Direct Attribute Prediction (DAP) model introduced by Lampert et al. in CVPR 2009. This baseline helps evaluate the benefits of relative treatment of attributes as opposed to categorical. We evaluate these approaches for varying numbers of unseen categories, varying amounts of data used to train the attributes, varying number of attribtues used to describe the unseen categories, and varying levels of 'looseness' in the description of unseen categories. Details of the experimental set-up can be found in our paper. Results are shown below:

 

 


 

Auto-generated image descriptions:

 

In order to evaluate the quality of our relative image descriptions to the binary counterparts, we conducted a human study. We generated a description of an image using our approach, as well as the baseline binary attributes. We presented the subjects with this description, along with three images. One of the three images was the image being described. The task of the subjects was to rank the three images based on which one they thought was most likely to be the one being described. The more precise the description, the better the chances subjects have of identifying the correct image.  An illustration of a task presented to subjects is shown below:


 

 

The results of the study are shown below. We see that subjects can identify the correct image more accurately using our proposed relative attributes, as compared to the binary attributes.

 


 

 

 

Example binary descriptions of images as well as descriptions relative to categories are shown below:

  

        

Image Binary descriptions Relative descriptions
not natural
not open
perspective
more natural than tallbuilding, less natural than forest
more open than tallbuilding, less open than coast
more perspective than tallbuilding
not natural
not open
perspective
more natural than insidecity, less natural than highway
more open than street, less open than coast
more perspective than highway, less perspective than insidecity
natural
open
perspective
more natural than tallbuilding, less natural than mountain
more open than mountain
less perspective than opencountry
White
not Smiling
VisibleForehead
more White than AlexRodriguez
more Smiling than JaredLeto, less Smiling than ZacEfron
more VisibleForehead than 
JaredLeto, less VisibleForehead than MileyCyrus
White
not Smiling
not VisibleForehead
more White than AlexRodriguez, less White than MileyCyrus
less Smiling than HughLaurie
more VisibleForehead than 
ZacEfron, less VisibleForehead than MileyCyrus
not Young
BushyEyebrows
RoundFace
more Young than CliveOwen, less Young than ScarlettJohansson
more BushyEyebrows than 
ZacEfron, less BushyEyebrows than AlexRodriguez
more RoundFace than CliveOwen, less RoundFace than ZacEfron

  


Data


We provide the learnt relative attributes and their predictions for the two datasets used in our paper: Outdoor Scene Recognition (OSR) and a subset of the Public Figures Face Database (PubFig). 

 

README

Download (v2)

 

Relative Face Attributes Dataset. It contains annotations for 29 relative attributes on 60 categories from the Public Figures Face Database (PubFig). 

  


Code


We modified Olivier Chappelle's RankSVM implementation to train relative attributes with similarity constraints. Our modified code can be found here.

 

If you use our code, please cite the following paper:

D. Parikh and K. Grauman

Relative Attributes

International Conference on Computer Vision (ICCV), 2011.

   


Demos


Demos of various applications of relative attributes can be found here. A description of these applications can be found in the papers here.

 


Publications


D. Parikh and K. Grauman

Relative Attributes

International Conference on Computer Vision (ICCV), 2011. (Oral)

Marr Prize (Best Paper Award) Winner

[slides] [talk (video)] [poster] [demos]

 

Following are our other papers that use relative attributes: 

 

A. Biswas and D. Parikh

Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013

[project page and data] [poster] [demo]

  

A. Parkash and D. Parikh
Attributes for Classifier Feedback
European Conference on Computer Vision (ECCV), 2012 (Oral)

                        [slides] [talk (video)[project page and data] [demo]

   
A. Kovashka, D. Parikh and K. Grauman
WhittleSearch: Image Search with Relative Attribute Feedback
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012
[
project page[poster] [demo]

 

D. ParikhA. Kovashka, A. Parkash and K. Grauman
Relative Attributes for Enhanced Human-Machine Communication (Invited paper)
AAAI Conference on Artificial Intelligence (AAAI), 2012 (Oral)

 

 

[Thanks to Yong Jae Lee for the webpage template]