DERİN ÖĞRENME İLE GRUP HAREKETLERİNİN SABİT RESİM ÜZERİNDEN TANINMASI

ATVAR, ANIL

View/Open

10219856.pdf (8.220Mb)

Date

2018

Author

ATVAR, ANIL

xmlui.dri2xhtml.METS-1.0.item-emb

Acik erisim

xmlui.mirage2.itemSummaryView.MetaData

Show full item record

Abstract

The main problem focused in this thesis is inferring the group activity information from still images and classifying them. Activity information is often meaningful when analyzed based on the timeline. This is one of the reasons that complicates the problem. For example, if two people are not in the same activity but are standing side-by-side in a still image, they will most likely be classified to have the same activity. For reasons like this, classification of group activities in still images is a challenging problem. To overcome these difficulties, detection and classification of individual human from still image should be done with high accuracy. At the same time, this approach constitutes the first part of the thesis. Deep learning techniques, which are yielding successful results in object detection and classification problems in recent years, are preferred methods to solve the problems dealt with in the scope of this thesis. It is important that the features should be well chosen to represent individual humans and groups in images. The success of deep learning techniques also provides more advantages than other methods at this point. These features can be automatically learned by the model in deep learning approaches. There are additional challenges associated with choosing deep learning methods as a base problem solvers. %When deep learning methods are chosen for problem solving, some other challenges occurred. At the top of these difficulties is deciding a deep learning model that is suitable for problem. Since we are dealing with classification problems, Convolutional Neural Networks (CNN) \cite{paper1} have been chosen to be adapted for group activity recognition. As a model to be used in this method, ResNet \cite{paper2} architecture, which is preferred for complex classification problems in recent years, has been preferred. Another difficulty in the field of deep learning is that decide size and variety of dataset . SGD \cite{paper3} was preferred as a dataset in the thesis. The most challenging issue in the thesis is that in the preferred dataset \cite{paper3}, the singular human orientation and group activity classes are not able to increase the classification performance due to the lack of sufficient numbers and diversity of samples. In addition to group activity information, joint and segment informations were also used to overcome these difficulties. In order to merge these informations into the deep learning process, fusion processes were performed and then results were observed. Within the thesis, it can be observed that the success of the detection and classification is achieved by the choice of the deep learning techniques.

URI

http://hdl.handle.net/11655/5552

xmlui.mirage2.itemSummaryView.Collections

Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu [255]

xmlui.dri2xhtml.METS-1.0.item-citation

[1]Krizhevsky A. Sutskever I. & Hinton G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105. 2012. [2] He K. Zhang X. Ren S. & Sun J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778. 2016. [3] Choi W. Chao Y. W. Pantofaru C. & Savarese S. Discovering groups of people in images. In European conference on computer vision, pages 417–433. 2014. [4] Y. Wang J. Song Y. Leung T. Rosenberg C. Wang J. Philbin J. & Wu. Learning fine-grained image similarity with deep ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1386–1393. 2014. [5] Bourdev L. & Malik J. Poselets: Body part detectors trained using 3d human pose annotations. In IEEE 12th International Conference, pages 1365–1372. 2009. [6] Singh S. Gupta A. & Efros A. A. Unsupervised discovery of mid-level discriminative patches. In Computer Vision–ECCV, pages 73–86. 2012. [7] Dalal N. & Triggs B. Histograms of oriented gradients for human detection. In IEEE Computer Society Conference, pages 886–893. 2005. [8] He K. Gkioxari G. Dollar P. & Girshick R. Mask r-cnn. In Computer Vision (ICCV), pages 2980–2988. 2017. [9] Chang C. C. & Lin C. J. Libsvm: a library for support vector machines. In ACM transactions on intelligent systems and technology, pages –. 2011. [10] Ke Q. Bennamoun M. An S. Sohel F. & Boussaid F. A new representation of skeleton sequences for 3d action recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4570–4579. 2017. 50 [11] Gers F. A. Schmidhuber J. & Cummins F. Learning to forget: Continual prediction with lstm. In -, pages –. 1999. [12] Cao Z. Simon T.Wei S. E. & Sheikh Y. Realtime multi-person 2d pose estimation using part affinity fields. In CVPR, pages Vol. 1, No. 2, p. 7. 2017. [13] Amer M. R. Xie D. Zhao M. Todorovic S. & Zhu S. C. Cost-sensitive topdown/ bottom-up inference for multiscale activity recognition. In European Conference on Computer Vision, pages 187–200. 2012. [14] Felzenszwalb P.F. Girshick R.B. McAllester D. Ramanan D. Object detection with discriminatively trained part-based models. In IEEE Transactions on Pattern Analysis and Machine Intelligence, page 1627–1645. 2010. [15] Laptev I. Marszalek M. Schmid C. Rozenfeld B. Learning realistic human actions from movies. In CVPR, pages –. 2008. [16] Deng Z. Zhai M. Chen L. Liu Y. Muralidharan S. Roshtkhari M. J. & Mori G. Deep structured models for group activity recognition. In arXiv, page 1506.04191. 2015. [17] Deng J. DongW. Socher R. Li L. J. Li K. & Fei-Fei L. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition(CVPR), pages 248–255. 2009. [18] Yangqing Jia Evan Shelhamer Jeff Donahue Sergey Karayev Jonathan Long Ross Girshick Sergio Guadarrama and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. In arXiv, page 1408.5093. 2014. [19] Choi W. Shahid K. & Savarese S. Learning context for collective activity recognition. In Computer Vision and Pattern Recognition (CVPR), pages 3273–3280. 2011. 51 [20] Toshev A. & Szegedy C. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1653–1660. 2014. [21] B. Sapp and B. Taskar. Modec: Multimodal decomposable models for human pose estimation. In CVPR, pages –. 2013. [22] S. Johnson and M. Everingham. Clustered pose and nonlinear appearance models for human pose estimation. In BMVC, pages –. 2010. [23] Chao Y. W. Liu Y. Liu X. Zeng H. & Deng J. Learning to detect human-object interactions. In arXiv, page 1702.05448. 2017. [24] R. Girshick. Fast r-cnn. In arXiv, page 1504.08083. (2015. [25] Ren S. He K. Girshick R. & Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99. 2015. [26] https://keras.io/. In -, pages –. -. [27] https://www.tensorflow.org/. In -, pages –. -. [28] https://github.com/ildoonet/tf-pose-estimation. In -, pages –