Özet
Attributes are mid-level semantic concepts which describe visual appearance, functional affordance or other human-understandable aspects of objects and scenes. In the recent years, several works have investigated the use of attributes to solve various computer vision problems. Examples include
attribute based image retrieval, zero-shot learning of unseen object categories, part localization and face recognition.
This thesis proposes two novel attribute based approaches towards solving (i) top-down visual saliency estimation problem, and, (ii) unsupervised zero-shot object classification problem. For top-down saliency estimation, we propose a simple yet efficient approach based on Conditional Random Fields (CRFs), in which we use attribute classifier outputs as visual features. For zero-shot learning, we also propose a novel approach to solve unsupervised zero-shot object classification problem via attribute-class relationships. However, unlike other attribute-based approaches, we require attribute definitions only at training time, and require only the names of novel classes of interest at test time. Our detailed experimental results show that our methods perform on par with or better than the state-of-the-art.
Künye
[1]
Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter Welinder,
Pietro Perona, and Serge Belongie. Visual recognition with humans in the loop.
In European Conference on Computer Vision, pages 438–451. Springer, 2010.
[2]
Adriana Kovashka, Devi Parikh, and Kristen Grauman. Whittlesearch: Image
search with relative attribute feedback. In Computer Vision and Pattern Recog-
nition (CVPR), 2012 IEEE Conference on, pages 2973–2980. IEEE, 2012.
[3]
Rogerio Feris, Behjat Siddiquie, Yun Zhai, James Petterson, Lisa Brown,
and Sharath Pankanti. Attribute-based vehicle search in crowded surveillance
videos. In Proceedings of the 1st ACM International Conference on Multimedia
Retrieval, page 18. ACM, 2011.
[4]
Amar Parkash and Devi Parikh. Attributes for classifier feedback. In European
Conference on Computer Vision, pages 354–368. Springer, 2012.
[5]
Huizhong Chen, Andrew Gallagher, and Bernd Girod. Describing clothing by
semantic attributes. In Computer Vision–ECCV 2012, pages 609–623. Springer,
2012.
[6]
Zhiyuan Shi, Timothy M Hospedales, and Tao Xiang. Transferring a semantic
representation for person re-identification and search. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages 4184–
4193. 2015.
[7]
Alireza Farhadi, Ian Endres, Derek Hoiem, and David Forsyth. Describing ob-
jects by their attributes. In Computer Vision and Pattern Recognition, 2009.
CVPR 2009. IEEE Conference on, pages 1778–1785. IEEE, 2009.
[8]
Neeraj Kumar, Alexander C Berg, Peter N Belhumeur, and Shree K Nayar. At-
tribute and simile classifiers for face verification. In 2009 IEEE 12th Interna-
tional Conference on Computer Vision, pages 365–372. IEEE, 2009.
[9]
Jingen Liu, Benjamin Kuipers, and Silvio Savarese. Recognizing human actions
by attributes. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE
Conference on, pages 3337–3344. IEEE, 2011.
50[10]
Genevieve Patterson and James Hays. Sun attribute database: Discovering,
annotating, and recognizing scene attributes. In Computer Vision and Pattern
Recognition (CVPR), 2012 IEEE Conference on, pages 2751–2758. IEEE, 2012.
[11]
Devi Parikh and Kristen Grauman. Relative attributes. In 2011 International
Conference on Computer Vision, pages 503–510. IEEE, 2011.
[12]
Yanwei Fu, Timothy M Hospedales, Tao Xiang, and Shaogang Gong. Attribute
learning for understanding unstructured social activity. In European Conference
on Computer Vision, pages 530–543. Springer, 2012.
[13]
Huizhong Chen, Andrew C Gallagher, and Bernd Girod. What’s in a name? first
names as facial attributes. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 3366–3373. 2013.
[14]
P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona.
Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California In-
stitute of Technology, 2010.
[15]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD
Birds-200-2011 Dataset. Technical report, 2011.
[16]
Shuo Wang, Jungseock Joo, Yizhou Wang, and Song-Chun Zhu. Weakly su-
pervised learning for attribute localization in outdoor scenes. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pages
3111–3118. 2013.
[17]
Lubomir Bourdev, Subhransu Maji, and Jitendra Malik. Describing people: A
poselet-based approach to attribute classification. In 2011 International Confer-
ence on Computer Vision, pages 1543–1550. IEEE, 2011.
[18]
Gaurav Sharma and Frederic Jurie. Learning discriminative spatial representa-
tion for image classification. In BMVC, pages 1–11. BMVA Press, 2011.
[19]
Lucy Liang and Kristen Grauman. Beyond comparing image pairs: Setwise
active learning for relative attributes. In Proceedings of the IEEE conference on
Computer Vision and Pattern Recognition, pages 208–215. 2014.
[20]
Adriana Ivanova Kovashka. Interactive image search with attributes. Ph.D.
thesis, 2015.
[21]
C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to detect unseen object
classes by between-class attribute transfer. In CVPR, pages 951–958. 2009.
[22]
C.H. Lampert, H. Nickisch, and S. Harmeling. Attribute-based classification
for zero-shot visual object categorization. Pattern Analysis and Machine Intel-
ligence, IEEE Transactions on, 36(3):453–465, 2014.
[23]
Olivier Le Meur, Patrick Le Callet, Dominique Barba, and Dominique Thoreau.
A coherent computational approach to model the bottom-up visual attention.
IEEE transactions on pattern analysis and machine intelligence, 28:802–817,
2006.
[24]
Thomas Mauthner, Horst Possegger, Georg Waltner, and Horst Bischof. Encod-
ing based saliency detection for videos and images. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pages 2494–2502.
2015.
[25]
Vidhya Navalpakkam and Laurent Itti. An integrated model of top-down and
bottom-up attention for optimizing detection speed. In 2006 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition (CVPR’06),
volume 2, pages 2049–2056. IEEE, 2006.
[26]
Dirk Walther and Christof Koch. Modeling attention to salient proto-objects.
Neural networks, 19(9):1395–1407, 2006.
[27]
Ueli Rutishauser, Dirk Walther, Christof Koch, and Pietro Perona. Is bottom-
up attention useful for object recognition?
In 2004 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR’04), volume 2,
pages II–37. IEEE, 2004.
[28]
Yu-Fei Ma, Xian-Sheng Hua, Lie Lu, and Hong-Jiang Zhang. A generic frame-
work of user attention model and its application in video summarization. IEEE
transactions on multimedia, 7(5):907–919, 2005.
[29]
Ran Margolin, Ayellet Tal, and Lihi Zelnik-Manor. What makes a patch dis-
tinct? In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 1139–1146. 2013.
[30]
Federico Perazzi, Philipp Kr ̈ahenb ̈uhl, Yael Pritch, and Alexander Hornung.
Saliency filters: Contrast based filtering for salient region detection. In Com-
puter Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages
733–740. IEEE, 2012.
[31]
Chuan Yang, Lihe Zhang, Huchuan Lu, Xiang Ruan, and Ming-Hsuan Yang.
Saliency detection via graph-based manifold ranking. In Proceedings of the
IEEE conference on computer vision and pattern recognition, pages 3166–3173.
2013.
[32]
Erkut Erdem and Aykut Erdem. Visual saliency estimation by nonlinearly inte-
grating features using region covariances. Journal of vision, 13(4):11–11, 2013.
[33]
Wangjiang Zhu, Shuang Liang, Yichen Wei, and Jian Sun. Saliency optimization
from robust background detection. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 2814–2821. 2014.
[34]
Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, and Junmo Kim. Salient region
detection via high-dimensional color transform. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pages 883–890. 2014.
[35]
Ruth Rosenholtz. A simple saliency model predicts a number of motion popout
phenomena. Vision research, 39(19):3157–3163, 1999.
[36]
Ruth Rosenholtz. Search asymmetries? what search asymmetries? Perception
& Psychophysics, 63(3):476–489, 2001.
[37]
Jimei Yang and Ming-Hsuan Yang. Top-down visual saliency via joint crf and
dictionary learning. In Computer Vision and Pattern Recognition (CVPR), 2012
IEEE Conference on, pages 2296–2303. IEEE, 2012.
[38]
Aysun Kocak, Kemal Cizmeciler, Aykut Erdem, and Erkut Erdem. Top down
saliency estimation via superpixel-based discriminative dictionaries. In BMVC.
2014.
[39]
Moran Cerf, Jonathan Harel, Wolfgang Einh ̈auser, and Christof Koch. Predict-
ing human gaze using low-level saliency combined with face detection. In Ad-
vances in neural information processing systems, pages 241–248. 2008.
[40]
Stas Goferman, Lihi Zelnik-Manor, and Ayellet Tal. Context-aware saliency
detection. IEEE Transactions on Pattern Analysis and Machine Intelligence,
34(10):1915–1926, 2012.
[41]
Tilke Judd, Fr ́edo Durand, and Antonio Torralba. A benchmark of computa-
tional models of saliency to predict human fixations. 2012.
[42]
Antonio Torralba, Aude Oliva, Monica S Castelhano, and John M Henderson.
Contextual guidance of eye movements and attention in real-world scenes: the
role of global features in object search. Psychological review, 113(4):766, 2006.
[43]
Nazar Khan and Marshall F Tappen. Discriminative dictionary learning with
spatial priors. In 2013 IEEE International Conference on Image Processing,
pages 166–170. IEEE, 2013.
[44]
Marcin Marszałek and Cordelia Schmid. Accurate object recognition with shape
masks. International journal of computer vision, 97(2):191–209, 2012.
[45]
John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random
fields: Probabilistic models for segmenting and labeling sequence data. In Pro-
ceedings of the eighteenth international conference on machine learning, ICML,
volume 1, pages 282–289. 2001.
[46]
Ziad Al-Halah and Rainer Stiefelhagen. How to transfer? zero-shot object
recognition via hierarchical transfer of semantic attributes. In Applications of
Computer Vision (WACV), 2015 IEEE Winter Conference on, pages 837–843.
IEEE, 2015.
[47]
Marcus Rohrbach, Michael Stark, and Bernt Schiele. Evaluating knowledge
transfer and zero-shot learning in a large-scale setting. In Computer Vision
and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1641–1648.
IEEE, 2011.
[48]
Mohamed Elhoseiny, Babak Saleh, and Ahmed Elgammal. Write a classifier:
Zero shot learning using purely textual descriptions. In ICCV. 2013.
[49]
Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, and Bernt Schiele.
Evaluation of output embeddings for fine-grained image classification. In Com-
puter Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE
Computer Society, 2015.
[50]
Jimmy Ba, Kevin Swersky, Sanja Fidler, and Ruslan Salakhutdinov. Predict-
ing deep zero-shot convolutional neural networks using textual descriptions. In
ICCV. 2015.
[51]
David G Lowe. Object recognition from local scale-invariant features. In Com-
puter vision, 1999. The proceedings of the seventh IEEE international confer-
ence on, volume 2, pages 1150–1157. Ieee, 1999.
[52]
David G Lowe. Local feature view clustering for 3d object recognition. In
Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of
the 2001 IEEE Computer Society Conference on, volume 1, pages I–682. IEEE,
2001.
[53]
David G Lowe. Distinctive image features from scale-invariant keypoints. In-
ternational journal of computer vision, 60(2):91–110, 2004.
[54]
Vittorio Ferrari and Andrew Zisserman. Learning visual attributes. In Advances
in Neural Information Processing Systems, pages 433–440. 2007.
[55]
Thomas Deselaers and Vittorio Ferrari. Visual and semantic similarity in ima-
genet. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Con-
ference on, pages 1777–1784. IEEE, 2011.
[56]
Yu Su and Fr ́ed ́eric Jurie. Learning compact visual attributes for large-scale
image classification. In Computer Vision–ECCV 2012. Workshops and Demon-
strations, pages 51–60. Springer, 2012.
[57]
Olga Russakovsky and Li Fei-Fei. Attribute learning in large-scale datasets. In
Trends and Topics in Computer Vision, pages 1–14. Springer, 2012.
[58]
Neeraj Kumar, Peter Belhumeur, and Shree Nayar. Facetracer: A search engine
for large collections of images with faces. In Computer Vision–ECCV 2008,
pages 340–353. Springer, 2008.
[59]
Behjat Siddiquie, Rogerio S Feris, and Larry S Davis. Image ranking and re-
trieval based on multi-attribute queries. In Computer Vision and Pattern Recog-
nition (CVPR), 2011 IEEE Conference on, pages 801–808. IEEE, 2011.
[60]
Jingen Liu, B. Kuipers, and S. Savarese. Recognizing human actions by at-
tributes. In CVPR. 2011.
[61]
B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. Guibas, and L. Fei-Fei. Human action
recognition by learning bases of action attributes and parts. In ICCV. 2011.
[62]
Gang Wang and David Forsyth. Joint learning of visual attributes, object classes
and visual saliency. In 2009 IEEE 12th International Conference on Computer
Vision, pages 537–544. IEEE, 2009.
[63]
Maxime Bucher, St ́ephane Herbin, and Fr ́ed ́eric Jurie. Improving semantic
embedding consistency by metric learning for zero-shot classification. arXiv
preprint arXiv:1607.08085, 2016.
[64]
Qian Wang and Ke Chen. Zero-shot visual recognition via bidirectional latent
embedding. arXiv preprint arXiv:1607.02104, 2016.
[65]
Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. Label-
embedding for attribute-based classification. In Computer Vision and Pattern
Recognition (CVPR), 2013 IEEE Conference on, pages 819–826. IEEE, 2013.
[66]
Mihir Jain, Jan C van Gemert, Thomas Mensink, and Cees GM Snoek. Ob-
jects2action: Classifying and localizing actions without any video example. In
Proceedings of the IEEE International Conference on Computer Vision, pages
4588–4596. 2015.
[67]
Amir Sadovnik, Andrew Gallagher, Devi Parikh, and Tsuhan Chen. Spoken at-
tributes: Mixing binary and relative attributes to say the right thing. In Proceed-
ings of the IEEE International Conference on Computer Vision, pages 2160–
2167. 2013.
[68]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification
with deep convolutional neural networks. In Advances in neural information
processing systems, pages 1097–1105. 2012.
[69]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Im-
agenet: A large-scale hierarchical image database. In Computer Vision and
Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255.
IEEE, 2009.
[70]
Yann LeCun, LD Jackel, Leon Bottou, A Brunot, Corinna Cortes, JS Denker,
Harris Drucker, I Guyon, UA Muller, Eduard Sackinger, et al. Comparison of
learning algorithms for handwritten digit recognition. In International confer-
ence on artificial neural networks, volume 60, pages 53–60. 1995.
[71]
Yann LeCun, L ́eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-
based learning applied to document recognition. Proceedings of the IEEE,
86(11):2278–2324, 1998.
[72]
Pierre Sermanet, David Eigen, Xiang Zhang, Micha ̈el Mathieu, Rob Fergus, and
Yann LeCun. Overfeat: Integrated recognition, localization and detection using
convolutional networks. arXiv preprint arXiv:1312.6229, 2013.
[73]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabi-
novich. Going deeper with convolutions. In Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pages 1–9. 2015.
[74]
Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. arXiv preprint
arXiv:1312.4400, 2013.
[75]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learn-
ing for image recognition. arXiv preprint arXiv:1512.03385, 2015.
[76]
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for
large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[77]
Hisham Cholakkal, Jubin Johnson, and Deepu Rajan. Backtracking scspm im-
age classifier for weakly supervised top-down saliency. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages 5278–
5287. 2016.
[78]
Hisham Cholakkal, Jubin Johnson, and Deepu Rajan. Weakly supervised top-
down salient object detection. arXiv preprint arXiv:1611.05345, 2016.
[79]
Tie Liu, Zejian Yuan, Jian Sun, Jingdong Wang, Nanning Zheng, Xiaoou Tang,
and Heung-Yeung Shum. Learning to detect a salient object. IEEE Transactions
on Pattern analysis and machine intelligence, 33(2):353–367, 2011.
[80]
Ali Borji and Laurent Itti. Exploiting local and global patch rarities for saliency
detection. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE
Conference on, pages 478–485. IEEE, 2012.
[81]
Peng Jiang, Haibin Ling, Jingyi Yu, and Jingliang Peng. Salient region detection
by ufo: Uniqueness, focusness and objectness. In Proceedings of the IEEE
International Conference on Computer Vision, pages 1976–1983. 2013.
[82]
Jianming Zhang and Stan Sclaroff. Saliency detection: A boolean map ap-
proach. In Proceedings of the IEEE International Conference on Computer
Vision, pages 153–160. 2013.
[83]
Jia Deng, Nan Ding, Yangqing Jia, Andrea Frome, Kevin Murphy, Samy Ben-
gio, Yuan Li, Hartmut Neven, and Hartwig Adam. Large-scale object classifica-
tion using label relation graphs. In Computer Vision–ECCV 2014, pages 48–64.
Springer, 2014.
[84]
Dinesh Jayaraman and Kristen Grauman. Zero-shot recognition with unreliable
attributes. In Advances in Neural Information Processing Systems, pages 3464–
3472. 2014.
[85]
Bernardino Romera-Paredes and PHS Torr. An embarrassingly simple approach
to zero-shot learning. In Proceedings of The 32nd International Conference on
Machine Learning, pages 2152–2161. 2015.
[86]
Svetlana Kordumova, Thomas Mensink, and Cees GM Snoek. Pooling objects
for recognizing scenes without examples. In Proceedings of the 2016 ACM on
International Conference on Multimedia Retrieval, pages 143–150. ACM, 2016.
[87]
Jason Weston, Samy Bengio, and Nicolas Usunier. Wsabie: Scaling up to large
vocabulary image annotation. 2011.
[88]
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statis-
tical learning, volume 1. Springer series in statistics Springer, Berlin, 2001.
[89]
Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon
Shlens, Andrea Frome, Greg S Corrado, and Jeffrey Dean. Zero-shot learning by
convex combination of semantic embeddings. arXiv preprint arXiv:1312.5650,
2013.
[90]
Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Tomas
Mikolov, et al. Devise: A deep visual-semantic embedding model. In Advances
in Neural Information Processing Systems, pages 2121–2129. 2013.
[91]
Mark Palatucci, Dean Pomerleau, Geoffrey E Hinton, and Tom M Mitchell.
Zero-shot learning with semantic output codes. In Advances in neural informa-
tion processing systems, pages 1410–1418. 2009.
[92]
Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. Zero-
shot learning through cross-modal transfer. In Advances in neural information
processing systems, pages 935–943. 2013.
[93]
Jason Weston, Samy Bengio, and Nicolas Usunier. Large scale image annota-
tion: learning to rank with joint word-image embeddings. Machine learning,
81(1):21–35, 2010.
[94]
John W Sammon. A nonlinear mapping for data structure analysis. IEEE Trans-
actions on computers, 18(5):401–409, 1969.
[95]
Zeynep Akata, Mateusz Malinowski, Mario Fritz, and Bernt Schiele. Multi-cue
zero-shot learning with strong supervision. arXiv preprint arXiv:1603.08754,
2016.
[96]
Yongqin Xian, Zeynep Akata, Gaurav Sharma, Quynh Nguyen, Matthias Hein,
and Bernt Schiele.
Latent embeddings for zero-shot classification.
arXiv
preprint arXiv:1603.08895, 2016.
[97]
Ziming Zhang and Venkatesh Saligrama. Zero-shot learning via semantic simi-
larity embedding. arXiv preprint arXiv:1509.04767, 2015.
[98]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Dis-
tributed representations of words and phrases and their compositionality. In
Advances in neural information processing systems, pages 3111–3119. 2013.
[99]
Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic regularities in
continuous space word representations. In HLT-NAACL, pages 746–751. 2013.
[100]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global
vectors for word representation. Proceedings of the Empiricial Methods in Nat-
ural Language Processing (EMNLP 2014), 12:1532–1543, 2014.
[101]
George A Miller. Wordnet: a lexical database for english. Communications of
the ACM, 38(11):39–41, 1995.
[102] Christiane Fellbaum. WordNet. Wiley Online Library, 1998.
[103] Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. Wordnet:: Similar-
ity: measuring the relatedness of concepts. In Demonstration papers at HLT-
NAACL 2004, pages 38–41. Association for Computational Linguistics, 2004.
[104]
George A Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and
Katherine J Miller. Introduction to wordnet: An on-line lexical database. Inter-
national journal of lexicography, 3(4):235–244, 1990.
[105]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation
of word representations in vector space. CoRR, abs/1301.3781, 2013.
[106]
Daniel N Osherson, Joshua Stern, Ormond Wilkie, Michael Stob, and Edward E
Smith. Default probability. Cognitive Science, 15(2):251–269, 1991.
[107]
Charles Kemp, Joshua B Tenenbaum, Thomas L Griffiths, Takeshi Yamada, and
Naonori Ueda. Learning systems of concepts with an infinite relational model.
In AAAI, volume 3, page 5. 2006.
[108]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and
Andrew Zisserman. The pascal visual object classes (voc) challenge. Interna-
tional journal of computer vision, 88(2):303–338, 2010.
[109]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah.
Ucf101: A
dataset of 101 human actions classes from videos in the wild. arXiv preprint
arXiv:1212.0402, 2012.
60[110]
HA Jhuang, HA Garrote, EA Poggio, TA Serre, and T Hmdb. A large video
database for human motion recognition. In Proc. of IEEE International Confer-
ence on Computer Vision. 2011.
[111]
Khurram Soomro and Amir R Zamir. Action recognition in realistic sports
videos. In Computer Vision in Sports, pages 181–208. Springer, 2014.
[112]
Marcin Marszalek and Cordelia Schmid. Accurate object localization with shape
masks. In 2007 IEEE Conference on Computer Vision and Pattern Recognition,
pages 1–8. IEEE, 2007.
[113]
Andreas Opelt, Axel Pinz, Michael Fussenegger, and Peter Auer. Generic object
recognition with boosting. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 28(3):416–431, 2006.
[114]
James MacQueen et al. Some methods for classification and analysis of multi-
variate observations. In Proceedings of the fifth Berkeley symposium on math-
ematical statistics and probability, volume 1, pages 281–297. Oakland, CA,
USA., 1967.
[115]
Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learn-
ing, 20(3):273–297, 1995.
[116]
Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human
detection. In 2005 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR’05), volume 1, pages 886–893. IEEE, 2005.
[117]
John Canny. A computational approach to edge detection. IEEE Transactions
on pattern analysis and machine intelligence, (6):679–698, 1986.
[118]
Gernot Hoffmann. Cielab color space. Wikipedia, the free encyclopedia. mht,
2003.
[119]
Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, and Yasemin Al-
tun. Large margin methods for structured and interdependent output variables.
Journal of Machine Learning Research, 6(Sep):1453–1484, 2005.
[120]
Ben Taskar Carlos Guestrin Daphne Roller. Max-margin markov networks. Ad-
vances in neural information processing systems, 16:25, 2004.
61[121]
Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. Learning to
detect unseen object classes by between-class attribute transfer. In Computer
Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages
951–958. IEEE, 2009.
[122]
Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Re-
turn of the devil in the details: Delving deep into convolutional nets. arXiv
preprint arXiv:1405.3531, 2014.
[123]
Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization.
arXiv preprint arXiv:1412.6980, 2014.
[124]
Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,
Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al.
Tensorflow: Large-scale machine learning on heterogeneous systems, 2015.
Software available from tensorflow. org, 1, 2015.
[125]
Liefeng Bo and Cristian Sminchisescu. Twin gaussian processes for structured
prediction. International Journal of Computer Vision, 87(1-2):28–52, 2010.
[126]
Ziad Al-Halah, Makarand Tapaswi, and Rainer Stiefelhagen. Recovering the
missing link: Predicting class-attribute associations for unsupervised zero-shot
learning.
[127]
Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni,
Douglas Poland, Damian Borth, and Li-Jia Li. The new data and new challenges
in multimedia research. arXiv preprint arXiv:1503.01817, 2015.
[128]
Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. What is an object? In
Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on,
pages 73–80. IEEE, 2010.
[129]
Ming-Ming Cheng, Ziming Zhang, Wen-Yan Lin, and Philip Torr. Bing: Bina-
rized normed gradients for objectness estimation at 300fps. In Proceedings of
the IEEE conference on computer vision and pattern recognition, pages 3286–
3293. 2014.
[130]
Ian Endres and Derek Hoiem. Category independent object proposals. In Euro-
pean Conference on Computer Vision, pages 575–588. Springer, 2010.
[131]Pekka Rantalankila, Juho Kannala, and Esa Rahtu. Generating object segmen-
tation proposals using global and local search. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition, pages 2417–2424. 2014.