Interactively
Building a Discriminative Vocabulary of Nameable Attributes
Devi
Parikh and Kristen
Grauman

Abstract
Human-nameable
visual attributes offer many advantages when used as mid-level features
for object recognition, but existing techniques to gather relevant
attributes can be inefficient (costing substantial effort or expertise)
and/or insufficient (descriptive properties need not be
discriminative). We introduce an approach to define a vocabulary of
attributes that is both human understandable and discriminative. The
system takes object/scene-labeled images as input, and returns as
output a set of attributes elicited from human annotators that
distinguish the categories of interest. To ensure a compact vocabulary
and efficient use of annotators’ effort, we 1) show how to
actively augment the vocabulary such that new attributes resolve
inter-class confusions, and 2) propose a novel "nameability" manifold
that prioritizes candidate attributes by their likelihood of being
associated with a nameable property. We demonstrate the approach with
multiple datasets, and show its clear advantages over baselines that
lack a nameability model or rely on a list of expert-provided
attributes.
Motivation
To
be most
useful, attributes should be
Discriminative: so that they can be
learnt reliably in the available feature-space, and can
effectively classify the
categories
and
Nameable:
so
that they can be used
for zero-shot learning, describing previously unseen instances or
unsual aspects of images, etc.
| Existing
Approaches |
Discriminative |
Nameable |
| Hand-generated
list |
Not necessarily |
Yes |
| Mining the
web |
Not necessarily |
Yes |
| Automatic
splits of categories |
Yes |
No |
| Proposed |
Yes |
Yes |
Proposal
We
propose an
interactive approach that prompts a human-in-the-loop
to
provide names for attribute hypotheses it discovers. The system takes
as input a set of training images with their associated category
labels, as well as one or more visual feature spaces (Gist, color,
etc.), and returns as output a set of attribute models that together
can distinguish the categories of interest.
To
visualize a candidate
attribute for which the system seeks a name, a human is shown images
sampled along the direction normal to some separating hyperplane in the
feature space. Since many hypotheses will not correspond to something
humans can visually identify and succinctly describe, a naive attribute
discovery process — one that simply cycles through
discriminative
splits and asks the annotator to either name or reject them —
is
impractical.
Instead,
we design the
approach to actively minimize the amount of meaningless inquiries
presented to an annotator, so that human effort is mostly spent
assigning meaning to divisions in feature space that actually have it,
as opposed to discarding un-interpretable splits.
We
accomplish this with
two key ideas: at each iteration, our approach:
1)
focuses on attribute
hypotheses that complement the classification power of existing
attributes collected thus far, and
2)
predicts the
nameability of each discriminative hypothesis and prioritizes those
likely to be nameable. For this, we explore whether there exists some
manifold structure in the space of nameable hyperplane separators.

Approach
There
are
three main
challenges to be addressed in our proposed interactive
approach:
Discovering
attribute hypotheses:
We actively discover hyperplanes in the visual feature-space that
separate a subset of classes that are currently most
confused. We use
iterative max-margin clustering to discover such a split.
Predicting
the nameability of a hypothesis: At each iteration, we build
a nameability
manifold
using a mixture of probabilistic principal component
analysis to fit the responses of
the user collected so far.
The manifold is learnt in the space of hyperplane parameters. As seen
below, the manifold can effectively predict the nameability of a novel
discriminative hyperplane.

Visualizing
an
attribute:
In
order to present a visualization of a hyperplane to the user, we sample
images from the dataset such that their distance orthogonal to the
hyperplane varies, but any variations along the hyperplane are
minimized. The user is then asked to name a visual property that is
varying in the images from left to right. This name, along with the
hyperplane parameters, forms our newly discovered attribute.

Evaluation
We
evaluate our
approach on two datasets of 8 categories each: Outdoor Scene
Recognition (OSR) and a subset of the Animals with Attributes (AWA)
dataset. For both datasets, we use gist and color features.
In
order to
automatically evaluate our proposed approach, we collect nameability
annotations of all discrimininative hyperplanes (247) in both
feature-spaces of both datasets. We show a visualization of each of the
hyperplanes to 20 Amazon Mechanical Turk subjects, and ask them to
indicate how obvious of a change is visible in the images (on a scale
of 1-4), and what the changing property is. Example responses are shown
below:
"Black"

"Spotted"

Unnameable

"Green"

"Congested"

We
consider a hyperplane
to be nameable if the average 'obviousness' score received is above 3.
This pool of annotated hyperplanes can now be used to conduct automatic
experiments, while still mimicing a real user in the loop.
Results
Discriminative-only
baseline: As
compared to
a baseline that presents the discriminative hyperplanes to the user
without nameability modeling (see below), we find that our approach
discovers more named attributes with the same user effort, also leading
to better recognition performance.

Descriptive-only
baseline: On
the
other hand, as
compared to purely descriptive attributes, our approach finds more
discriminiative attributes, also leading to improved recognition
performance (see below).

Automatically
generated
descriptions: Our
discovered
attributes can be used to describe previously seen and previously
unseen (eg. zerba) images as seen below.

Publications
D. Parikh
and K. Grauman
Interactively
Building a
Discriminiative Vocabulary of Nameable Attributes
IEEE
Conference on
Computer Vision and Pattern Recognition (CVPR), 2011
[supplementary
material] [poster]
[slides]
D. Parikh
and K. Grauman
Interactive
Discovery of Task-Specific Nameable Attributes (Abstract)
First Workshop on
Fine-Grained Visual
Categorization (FGVC)
held
in
conjunction with IEEE Conference
on Computer
Vision and Pattern Recognition (CVPR), 2011
(Best
Poster Award)
[poster]
[Thanks
to Yong
Jae Lee
for the webpage template]