ECE 5984 Spring 2014

ECE 5984: Advanced Topics in Computer Vision, Spring 2014

Electrical and Computer Engineering Department, Virginia Tech

Meets: TR 3:30 pm to 4:45 pm in Hutcheson Hall (HUTCH) 207 McBryde Hall (MCB) 233.

Instructor: Devi Parikh
Email: parikh@vt.edu
Office: 440 Whittemore Hall
Office hours: By appointment.

Course overview Pre-requisite Requirements Important dates Schedule Resources

Prize Winners!

Best Paper Presentation: Jason Ziglar and John Peterson

Best Discussion Participant: Ramakrishna Vedantam

Best Project: Michael Cogswell

Congratulations!

Class projects:

Dual Channel Analytics and Tracking of Cells Experiencing the Dielectrophoretic Force (Lisa Anders)

Semantic Segmentation with Deep Learning (Michael Cosgswell)

Building High-Level Object Vocabularies (Jacob Dennis)

Application of Selective Search to Pose Estimation (Ujwal Krothapalli)

Making Intelligent and Interpretable Classification Systems (Shrenik Lad)

Towards Cascade Object Detection (John Peterson)

Person of Interest in Images (Clint Solomon)

Understanding Predictions of Structured Probabilistic Vision Systems (Qing Sun)

Improving Image Segmentation using Object Proposals (Sean Thweatt)

Understanding and Predicting Importance for Abstract Images (Ramakrishna Vedantam)

Improving SIFT Matching by Interest Points Filtering (Rabih Younes)

Salient Superpixels (Jason Ziglar)

Improving DPM accuracy with Component Selection Strategy (Peng Zhang)

Course overview:

This is a graduate course in computer vision. The focus of this course is to survey and critique current and state-of-the-art approaches in computer vision. We will read and analyze the strengths and weaknesses of research papers on a variety of important topics pertaining to visual recognition and identify open research questions. See the schedule for a list of topics we will cover.

Pre-requisite:

An introduction to computer vision or equivalent course. A machine learning or pattern recognition course may be beneficial.

Requirements:

Following are the requirements to successfully complete this course:

Discussion Paper reviews Presentations Project

Discussion (15% of your grade): Students will be required to read the assigned papers before each class and actively participate in discussions in class.

Paper reviews (25% of your grade): Students will be required to write a detailed review of one assigned paper and a high-level review of another assigned paper before each class. The combined reviews should be not more than one page (11 point, times new roman, 1 inch margins). Please submit the review as firstname_lastname_MM_DD.pdf (all in small letters, where MM is the month and DD is the day).

Detailed review: Each review should summarize the paper in 2-3 sentences, describe the approach taken, and clearly identify the main contribution of the paper. The review should describe the strengths and weaknesses of the paper. Comments on how convincing the experiments were, if the material was well presented, and how the paper can be improved and extended should be included. A good review also comments on how the paper relates to other papers we have read or you know of. Finally, identify any interesting open research questions or applications that arise from reading the paper.
High-level review: Describe the problem being addressed and provide a high-level description of the general approach or intuition behind the approach. Details of the approach, discussion of strengths and weaknesses, etc. are not required.
Due dates: The reviews should be emailed to the instructor by 12:00 pm (noon) the day of the class (i.e. on Tuesdays and Thursdays). You can use upto 3 late days over the course of the semester. Beyond that, your submission will not be accepted. You need not submit paper reviews during the classes where you are presenting the papers (see below).

Presentations (25% of your grade): Each student will be asked to present the topic associated with a class 2 or 3 times over the course of the semester (number will depend on class enrollment). Each presentation should be 45 minutes long. Students should practice their talks ahead of time to make sure they are 45 minutes long -- not shorter by more than a few minutes, and certainly not longer. The talks should be well organized and polished. Following will be required to prepare the presentation.

Papers (65% of the presentation grade): The student should read the assigned papers and other background papers to gain a good perspective on the topic as a whole. In addition to presenting this perspective and background information, students should present at least two papers in detail. For each paper, students should clearly state the problem statement, and motivate why the problem is interesting and important. The key technical ideas should be presented. The student should describe the experimental set-up and present the results obtained. The strengths and weaknesses of the paper should be discussed. The student should discuss how the different papers relate to each other (similarities and differences). Finally, interesting open research questions should be identified. The slides should be made as visual (with videos, images, animations) and clear as possible. The student should look at the links provided next to each paper, as well as the authors webpages for extra material such as slides, videos, extra results, etc. Students are encouraged to search for relevant material online, the links provided here are not comprehensive. Please clearly cite the source of each slide that is not your own. Even if you use slides made by the author, it is your responsibility to make sure your presentation as a whole flows well.

Experiments (35% of the presentation grade): The student should also conduct some (small-scale) experiments on at least one of the papers (any of those listed or other relevant papers) to analyze an interesting and meaningful aspect of the approach that the paper has not analyzed (e.g. different datasets, sensitivity to any parameters in the approach, etc.) to gain a more complete understanding of the paper, and to see if the approach "really works". A distilled demo-version of the main idea of the approach presented in the paper should be implemented. The goal is not to regenerate the results already present in the paper. You may implement it yourself, or download code if available. Again, the experimental setup, any non-trivial implementation choices you made, results obtained and conclusions one can draw from them should be described in the presentation (within the 45 minutes). Please cite any existing code or data you use for your experiments.

Announcement: please email the instructor 6 topics preferences (along with the associated date) for your presentations by the end of the day Wednesday, January 22nd. See the schedule for the semester.

Project (35% of your grade): Your project can be about extending a technique we studied in class, or empirically analyzing it. Comparisons between two approaches are also welcome. It is wonderful if you design and evaluate a novel approach to an important vision problem. Look at the schedule to get ideas on what topics might be of interest to you for your project. Perhaps the experiments part of your presentation above might give you project ideas. Be creative! If you need help with ideas for your project please come talk to me. You can work with a partner if you like. The following are deliverables for your project.

Proposal (10% of your project grade, due March 6th): A 1-page description that describes the following:

Problem statement: Clearly state the goal of your project.
Related work: Briefly describe existing related work (with citations) and what your project brings to the table that these other works do not. The most relevant papers may not necessarily be papers listed on the schedule, so be sure to also look beyond the list.
Approach: Describe the technical approach you plan to employ
Experiments and results: Describe the experimental setup you will follow, which datasets you will use, which existing code you will exploit, what you will implement yourself, and what you would define as a success for the project. If you plan on collecting your own data, describe what data collection protocol you will follow. Specify if you plan on experimentally analyzing different characteristics of your approach, or if you will compare to existing techniques. Provide a list of experiments you will perform. Describe what you expect the experiments to reveal, or what is uncertain about the potential outcomes. If you have any preliminary results, please summarize those as well.

Mid-semester presentation (20% of your project grade, in class on March 27th and April 1st): A 10-minute presentation that describes the same points as the proposal in as much detail as 10 minutes allow. Describe any challenges you are facing. Clearly state what has already been accomplished, and what remains for the end of the semester.

Final presentation (35% of your project grade, in class between April 24th and May 6th): A 10-minute presentation describing the same points as above. Any assumptions of your approach should be clearly stated. Any insights on future extensions of this project should be discussed.

Final report (35% of your project grade, due May 12th): A 4-page self-contained document describing your project. It should follow the same format and quality of presentation as conference papers you have been reading in this class. Thoroughly describe related work and explain what your work brings to the table that existing work did not. Any student in the class should be able to clearly understand your report such that he/she can implement it. The reader should not have to read another paper to understand your approach. Justify any design choices or judgment calls you made in your approach. When describing the experiments, describe why you conducted each experiment, and what you can conclude from the results. Include clear figures and tables, as well as illustrative qualitative examples if appropriate. If your project has a new idea and promising results, this report could serve as a first draft for a future conference submission! Please use the CVPR latex template (http://www.pamitc.org/cvpr13/files/cvpr2013AuthorKit.zip) to write your report.

The class will vote on best paper presentation, best project and best discussion-participant!

Feedback is very welcome. If you have any questions or concerns about the class or the requirements, please be sure to discuss them with the instructor early on. Please email the instructor to set up an appointment.

No laptops, cell phone or other distractions in class please.

Important dates:

January 22nd: Email the instructor 6 topic preferences (along with the associated dates) you would like to present in class. Please see the schedule for the semester.
March 6th: Project proposals due
March 27th and April 1st: Mid-semester project presentations in class
April 24th-May 6th: Final project presentations in class May 10th: Final project presentations from 10:00 am to 1:30 pm
May 12th: Final project report due

Schedule (topic and papers)

Date	Topic and papers	Presenter
01/21	Introduction	Devi [slides]
01/23	Research overview	Devi
01/28	Local features-based image descriptions Detail: Object Categorization by Learned Universal Visual Dictionary. J. Winn, A. Criminisi and T. Minka. ICCV 2005. [project page] High-level: A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid. CVPR 2003. Background: Seminal paper: Object Recognition from Local Scale-Invariant Features. D. Lowe. ICCV 1999. [code] [other implementations of SIFT] [IJCV paper] Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid and J. Ponce. CVPR 2006. [15 scenes dataset] [pyramid match tooklit] [Matlab code]	Rabih
01/30	Object discovery Detail: Using Multiple Segmentations to Discover Objects and their Extent in Image Collections. B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman and A. Zisserman. CVPR 2006. [code] High-level: Foreground Focus: Finding Meaningful Features in Unlabeled Images. Y. J. Lee and K. Grauman. BMV 2008. [project page]	Jacob
02/04	Object detection Detail: A Discriminatively Trained, Multiscale, Deformable Part Model. P. Felzenszwalb, D. McAllester and D. Ramanan. CVPR 2008. [code] High-level: Histograms of Oriented Gradients for Human Detection. N. Dalal and B. Triggs. CVPR 2005. [video] [PASCAL datasets] Extra: Rapid Object Detection Using a Boosted Cascade of Simple Features. P. Viola and M. Jones. CVPR 2001. Diagnosing Error in Object Detectors. D. Hoiem, Y. Chodpathumwan and Q. Dai. ECCV 2012. [code and data]	Ujwal
02/06	Object proposals Detail: What is an Object? B. Alexe, T. Deselaers and V. Ferrari. CVPR 2010. [code] High-level: Category Independent Object Proposals. I. Endres and D. Hoiem. ECCV 2010. [project] Extra: Constrained Parametric Min-Cuts for Automatic Object Segmentation. J. Carreira and C. Sminchisescu. CVPR 2010. [code]	Qing
02/11	Segmentation Detail: Learning a Classification Model for Segmentation. X. Ren and J. Malik. ICCV 2003. High-level: Combining Top-down and Bottom-up Segmentation. E. Borenstein, E. Sharon and S. Ullman. CVPR workshop 2004. [data]	Sean (papers) Michael (experiments)
02/13	Classes canceled (snow)	N/A
02/18	Pose Detail: Real-Time Human Pose Recognition in Parts from a Single Depth Image. J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman and A. Blake. CVPR 2011. [video] [project page] High-level: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations. L. Bourdev and J. Malik. ICCV 2009. [code] Extra: Articulated Pose Estimation using Flexible Mixtures of Parts. Y. Yang and D. Ramanan. CVPR 2011. [code]	Ujwal (papers) Qing (experiments)
02/20	Context Detail: Object-Graphs for Context-Aware Category Discovery. Y. J. Lee and K. Grauman. CVPR 2010. [code] High-level: An Empirical Study of Context in Object Detection. S. Divvala, D. Hoiem, J. Hays, A. Efros and M. Hebert. CVPR 2009. [project page]	Jason
02/25	Holistic scene understanding Detail: TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. J. Shotton, J. Winn, C. Rother and A. Criminisi. ECCV 2006. [project page] [data] [code] High-level: Describing the Scene as a Whole: Joint Object Detection, Scene Classification and Semantic Segmentation. J. Yao, S. Fidler and R. Urtasun. CVPR 2012.	Michael
02/27	Groups of objects Detail: Recognition Using Visual Phrases. M. Sadeghi and A. Farhadi. CVPR 2011. High-level: Automatic Discovery of Groups of Objects for Scene Understanding. C. Li, D. Parikh and T. Chen. CVPR 2012. [project page]	Prakriti
03/04	Saliency Detail: Learning to Predict Where Humans Look. T. Judd, K. Ehinger, F. Durand, and A. Torralba. ICCV 2009. [project page] High-level: Learning to Detect a Salient Object. T. Liu, J. Sun, N. Zheng, X. Tang, H. Shum. CVPR 2007. [results] [data] [code] Extra: A Model of Saliency-based Visual Attention for Rapid Scene Analysis. L. Itti, C. Koch, and E. Niebur. PAMI 1998.	Clint
03/06	Project proposals due Importance Detail: Reading Between the Lines: Object Localization Using Implicit Cues from Image Tags. S. J. Hwang and K. Grauman. CVPR 2010. High-level: Understanding and Predicting Importance in Images. A. Berg, T. Berg, H. Daume, J. Dodge, A. Goyal, X. Han, A, Mensch, M. Mitchell, A. Sood, K. Stratos and K. Yamaguchi. CVPR 2012. [UIUC sentence dataset] [ImageClef dataset] Extra: Some Objects are More Equal Than Others: Measuring and Predicting Importance. M. Spain and P. Perona. ECCV 2008. What Makes an Image Memorable? P. Isola, J. Xiao, A. Torralba, A. Oliva. CVPR 2011. [project page] [code and data]	Sean
03/11	Spring break: no class	N/A
03/13	Spring break: no class	N/A
03/18	Action Recognition Detail: Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition. A. Kovashka and K. Grauman. CVPR 2010. High-level: Action Recognition from a Distributed Representation of Pose and Appearance. S. Maji, L. Bourdev and J. Malik. CVPR 2011. [code]	John
03/20	Global and high-level image descriptions Detail: Efficient Object Category Recognition Using Classemes. L. Torresani, M. Szummer and A. Fitzgibbon. ECCV 2010. [code and data] High-level: Objects as Attributes for Scene Classification. L.-J. Li, H. Su, Y. Lim and L. Fei-Fei, 1st International Workshop on Parts and Attributes, ECCV 2010. Extra: Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope. A. Oliva and A. Torralba. IJCV 2001. [Gist code] Object Bank: A High-Level Image Representation for Scene Classiﬁcation & Semantic Feature Sparsiﬁcation. L-J. Li, H. Su, E. Xing, L. Fei-Fei. NIPS 2010. [code]	Clint (papers) Rama (experiments)
03/25	Attributes Detail: Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer. C. Lampert, H. Nickisch and S. Harmeling. CVPR 2009. [project page with data] High-level: Describing Objects by Their Attributes. A. Farhadi, I. Endres, D. Hoiem and D. Forsyth, CVPR 2009. [data] Extra: Relative Attributes. D. Parikh and K. Grauman. ICCV 2011. [code and data]	Peng (papers) Shrenik (experiments)
03/27	Mid-semester project presentations	TBD
04/01	Mid-semester project presentations	TBD
04/03	Human-in-the-loop Detail: Visual Recognition with Humans in the Loop. S. Branson, C. Wah, B. Babenko, F. Schroff, P. Welinder, P. Perona and S. Belongie. ECCV 2010. High-level: Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds. S. Vijayanarasimhan and K. Grauman. CVPR 2011. Extra: iCoseg: Interactive Co-segmentation with Intelligent Scribble Guidance. D. Batra, A. Kowdle, D. Parikh, J. Luo and T. Chen. CVPR 2010. [project page]	Jason (papers)
04/08	Crowdsourcing Detail: Labeling Images with a Computer Game. L. von Ahn and L. Dabbish. CHI 2004. High-level: Adaptively Learning the Crowd Kernel. O. Tamuz, C. Liu, S. Belongie, O. Shamir and A. Kalai. ICML 2011. Extra: Crowdclustering. R. Gomes, P. Welinder, A. Krause and P. Perona. NIPS 2011.	Shrenik
04/10	Big data Detail: IM2GPS: Estimating Geographic Information From a Single Image. J. Hays and A. Efros. CVPR 2008. [project page with data and Flickr download scripts] High-level: Unbiased Look at Dataset Bias. A. Torralba and A. Efros. CVPR 2011. [project page] Extra: Scene Completion using Millions of Photographs. J. Hays and A. Efros. SIGGRAPH 2007. [project page] 80 Million Tiny Images: A Large Dataset for Non-Parametric Object and Scene Recognition. A. Torralba, R. Fergus and W. Freeman. PAMI 2008. [project page]	John
04/15	Applications Detail: Photo Tourism: Exploring Photo Collections in 3D. N. Snavely, S. Seitz and R. Szeliski. SIGGRAPH 2006. [project page] High-level: LeafSnap: A Computer Vision System for Automatic Plant Species Identification. N. Kumar, P. Belhumeur, A. Biswas, D. Jacobs, W. Kress, I. Lopez, J. Soares. ECCV 2012. Extra: FaceTracer: A Search Engine for Large Collections of Images with Faces. N. Kumar, P. Belhumeur and S. Nayar. ECCV 2008. [code, data, demo]	Lisa (papers) Peng (experiments)
04/17	Human abilities Detail: Rapid natural scene categorization in the near absence of attention. L. Fei-Fei, R. VanRullen, C. Koch and P. Perona. PNAS 2002. High-level: What Do We Perceive in a Glance of a Real-World Scene? L. Fei-Fei, A. Iyer, C. Koch and P. Perona. Journal of Vision, 2007.	No class (Reviews are still due)
04/22	Language and images Detail: Every Picture Tells a Story: Generating Sentences for Images. A. Farhadi, M. Hejrati, A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier and D. Forsyth. ECCV 2010. [UIUC sentence dataset] High-level: Baby Talk: Understanding and Generating Simple Image Descriptions. G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg and T. L. Berg. CVPR 2012. Extra: Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers. A. Gupta and Larry S. Davis. ECCV 2008.	Rama
04/24	Images of people Detail: Names and Faces in the News. T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth. CVPR 2004. [project page] High-level: Estimating Age, Gender and Identity using First Name Priors. A. Gallagher and T. Chen. CVPR 2008. [project page] Extra: Exploring Photobios. I. Kemelmacher-Shlizerman, E. Shechtman, R. Garg and S. Seitz. SIGGRAPH 2011. [project page] Autotagging Facebook: Social Network Context Improves Photo Annotation. Z. Stone, T. Zickler and T. Darrell. CVPR Internet Vision Workshop 2008.	Lisa
04/29	No class; work on projects.	N/A
05/01	No class; work on projects.	N/A
05/06	No class; work on projects.	N/A
05/10	Final project presentations (10:00 am to 1:30 pm)	All
05/12	Final project reports due	N/A

Resources

Other code and data:

Visual Object Recognition synthesis lecture by Grauman and Leibe (short book on object recognition methods)
Compiled list of recognition datasets
OpenCV (open source computer vision library)
Weka (Java data mining software)
Netlab (Matlab toolbox for data analysis techniques, written by Ian Nabney and Christopher Bishop)
CV Online
Annotated Computer Vision Bibliography
Oxford group interest point software
Andrea Vedaldi's VLFeat code, including SIFT, MSER, hierarchical k-means.
INRIA LEAR team's software, including interest points, shape features
FLANN - Fast Library for Approximate Nearest Neighbors. Marius Muja et al.
Google Goggles
Kooaba
LSH homepage
Code for downloading Flickr images, by James Hays
UW Community Photo Collections homepage
INRIA Holiday images dataset
NUS-WIDE tagged image dataset of 269K images
MIRFlickr dataset
LIBPMK feature extraction code, includes dense sampling
LIBSVM library for support vector machines
PASCAL VOC Visual Object Classes Challenge
Fast SLIC superpixels
Greg Mori's superpixel code
Berkeley Segmentation Dataset and code
Pedro Felzenszwalb's graph-based segmentation code
Mean-shift: a Robust Approach Towards Feature Space Analysis [pdf] [code, Matlab interface by Shai Bagon]
David Blei's Topic modeling code
Berkeley 3D object dataset (kinect)
Labelme Database
Stanford Event Dataset
SUN Scene and object dataset
ImageNet dataset of 15K objects and ImageNet challenge
Animals with Attributes dataset
aYahoo and aPascal attributes datasets
Attribute discovery dataset of shopping categories
Public Figures Face database with attributes
Relative attributes data
WhittleSearch relative attributes data
SUN Scenes attribute dataset
Cross-category object recognition (CORE) dataset
Leeds Butterfly Dataset
FaceTracer database from Columbia
Caltech-UCSD Birds dataset
Database of human attributes
Face detection code in OpenCV
Gallagher's Person Dataset
Face data from Buffy episode, from Oxford Visual Geometry Group
CALVIN upper-body detector code
UMass Labeled Faces in the Wild
Ivan Laptev's Space-Time Interest Points code
Hollywood activity dataset
Stanford 40 Actions still image dataset
Stanford People Playing Musical Instrument dataset
UCF activity datasets
PASCAL VOC action recognition taster challenge
TRECVID video retrieval challenge
UMich Collective Activity dataset
Egovision workshop at CVPR 2012
Amazon Mechanical Turk
Using Mechanical Turk with LabelMe
Point Cloud Library
Robot Operating System
KITTI Benchmark

Tutorials, workshops, summer schools:

Similar courses:

This course has been inspired by the following two courses:

Visual Recognition (Kristen Grauman, Texas-Austin, Fall 2012)
Learning-Based Methods in Vision (Alyosha Efros, CMU, Spring 2012)

Other similar courses:

Grounding Object Recognition and Scene Understanding (Antonio Torralba, MIT, Fall 2011)
Visual Scene Understanding (Derek Hoiem, UIUC, Spring 2009)
Statistical Models for Visual Recognition (Deva Ramanan, UCI, Winter 2009)
Object Recognition and Scene Understanding (Antonio Torralba, MIT, Fall 2008)
Scene Understanding Seminar (Aude Oliva, MIT, Fall 2006)
Selected Topics in Vision & Learning (Serge Belongie, UCSD, Spring 2011)
Learning and Inference in Vision (Bill Freeman, MIT)
Cutting Edge of Computer Vision (Fei-Fei Li, Stanford)
Recognizing People, Objects, and Scenes (Jitendra Malik, Berkeley)
Recognition Problems in Computer Vision (Greg Mori, SFU)
Vision and Learning (Jianbo Shi, UPenn)