Deep Learnıng Archıtectures for Collectıve Actıvıty Recognıtıon

Zalluhoğlu, Cemil

dc.contributor.advisor	İkizler-Cinbiş, Nazlı
dc.contributor.author	Zalluhoğlu, Cemil
dc.date.accessioned	2019-10-21T12:40:57Z
dc.date.issued	2019-09-30
dc.date.submitted	2019-09-30
dc.identifier.uri	http://hdl.handle.net/11655/9433
dc.description.abstract	Collective activity recognition, which analyses the behavior of groups of people in videos, is an essential goal of video surveillance systems. In this thesis, we proposed three new solutions and one novel dataset for the collective activity recognition task. In the first method, we propose a new multi-stream convolutional neural network architecture that utilizes information extracted from multiple regions. The proposed method is the first work that uses a multi-stream network and multiple regions in this problem. Various strategies to fuse multiple spatial and temporal streams are explored. We evaluate the proposed method on two benchmark datasets, the Collective Activity Dataset, and the Volleyball Dataset. Our experimental results show that the proposed method improves collective activity recognition performance when compared to the state-of-the-art approaches. In trying to solve this problem, we realized that the existing datasets are insufficient for deep learning methods and have many limitations. Then we introduce the ”Collective Sports (C-Sports)” dataset, which is a novel benchmark dataset for multi-task recognition of both collective activity and sports categories. Various state-of-the-art techniques are evaluated on this dataset, together with a multi-task variant, which demonstrates increased performance. From the experimental results, we can say that while the sports action category recognition is relatively an easy task, there is still room for improvement for collective activity recognition, especially for the distant view situations. We believe that C-Sports dataset will stir further interest in this research direction. Our second proposed method involves an attention mechanism. We utilize the soft attention-based attention mechanism for action recognition and collective activity recognition tasks. We use attention maps that have high response values to regions that need attention in videos. We describe a method that using this attention mechanism with two distinct 3D-ConvNets architectures which are standard 3D-ConvNets (C3D) and inflated 3D-ConvNets (I3D). We evaluate our method on four benchmark datasets; two of them are about action recognition task, UCF101, and HMDB51. Others are related to collective activity recognition problem, Collective Activity Dataset, and Collective Sports Dataset. Experimental results show that the 3D attention-based ConvNets improves the performance on all datasets when compared to baselines which are 3D-ConvNets architectures without an attention mechanism. Our last proposed method of this thesis involves relation reasoning method and 3D attention mechanism. We propose a 3D Spatio-temporal relation network. We create this architecture by adding new methods step by step on the base Temporal Relation Network (TRN). First, a 2D attention mechanism has been added to TRN architecture. Then the 2D architecture is moved into 3D space. Finally, a 3D attention mechanism has been added on 3D TRN architecture. We evaluate these networks on one activity recognition and three collective activity recognition datasets, Something-Something v1, Collective Activity Recognition, Collective Sports, and Volleyball datasets, respectively. Our results show that the methods with attention mechanism improve the recognition performance. Besides, 3D networks obtain better accuracy when compared to 2D networks.	tr_TR
dc.language.iso	en	tr_TR
dc.publisher	Fen Bilimleri Enstitüsü	tr_TR
dc.rights	info:eu-repo/semantics/openAccess	tr_TR
dc.subject	Action recognition	tr_TR
dc.subject	Attention mechanism	tr_TR
dc.subject	Collective activity recognition	tr_TR
dc.subject	Convolutional neural networks	tr_TR
dc.subject	Long short term memory (LSTM)	tr_TR
dc.subject	Multi-task learning	tr_TR
dc.title	Deep Learnıng Archıtectures for Collectıve Actıvıty Recognıtıon	tr_TR
dc.title.alternative	Kolektif Aktivite Tanıma için Derin Öğrenme Yapıları
dc.type	info:eu-repo/semantics/doctoralThesis	tr_TR
dc.description.ozet	İnsan gruplarının videolardaki hareketlerini analiz eden kolektif aktivite tanıma, video gözetim sistemlerinin temel amacıdır. Bu tez kapsamında, kolektif aktivite tanıma problemi için üç yeni çözüm ve bir yeni veri seti önerdik. İlk yöntemde, birden fazla bölgeden çıkarılan bilgileri kullanan yeni bir çok akışlı evrişimsel sinir ağı mimarisi öneriyoruz. Önerilen yöntem, çoklu akış ağını ve çoklu bölgeleri bu alanda kullanan ilk çalışmadır. Mekansal ve zamansal akışları birleştirmek için çeşitli stratejiler araştırılmıştır. Önerilen yöntemi bu alandaki iki karşılaştırmalı veri seti üzerinde değerlendiriyoruz, Kolektif Aktivite veri kümesi ve Voleybol veri kümesi. Deneysel sonuçlarımız, önerilen yöntemin, en modern yaklaşımlarla karşılaştırıldığında kolektif aktivite tanıma performansını iyileştirdiğini göstermektedir. Bu sorunu çözmeye çalışırken, mevcut veri setlerinin derin öğrenme yöntemleri için yetersiz olduğunu ve birçok sınırlaması olduğunu fark ettik. Daha sonra, kolektif aktivite ve spor kategorilerinin olduğu çoklu görev tanıma içeren yeni bir veri seti olan “Kolektif Spor (C-Spor)” veri setini sunuyoruz. Bu veri seti üzerinde, çok görevli bir varyantla birlikte, performansın arttığını gösteren çeşitli teknikler değerlendirilmektedir. Deneysel sonuçlardan, spor eylem kategorisi tanımanın nispeten kolay bir görev olmasına rağmen, özellikle uzak görüşe sahip durumlar için, kolektif aktivite tanıma için hala iyileştirme için yer olduğunu söyleyebiliriz. C-Spor veri setinin bu araştırma alanında daha fazla ilgi çekeceğine inanıyoruz. Önerilen ikinci yöntemimiz dikkat mekanizmasını içermektedir. Aksiyon tanıma ve kolektif aktivite tanıma görevleri için yumuşak dikkat temelli dikkat mekanizmasını kullanıyoruz. Videolarda dikkat gerektiren bölgelere yoğunlaşan dikkat haritaları kullanıyoruz. Bu dikkat mekanizmasını, standart 3B evrişimsel sinir ağları (C3D) ve şişirilmiş 3B evrişimsel sinir ağları (I3D) olan iki ayrı 3B evrişimsel sinir ağları mimarileriyle birlikte kullanan bir yöntemi tanımlıyoruz. Yöntemimizi dört temel veri kümesi üzerinde değerlendiriyoruz; İkisi eylem tanıma görevi ile ilgilidir, UCF101 ve HMDB51. Diğerleri ise kolektif aktivite tanıma problemi ile ilgilidir, Kolektif Aktivite Veri Kümesi ve Kolektif Spor Veri Kümesi. Deneysel sonuçlar, 3B dikkat tabanlı evrişimsel sinir ağlarının, dikkat mekanizması olmayan 3B evrişimsel sinir ağları mimarileriyle karşılaştırıldığında değerlendirilen tüm veri kümelerindeki tüm alanlarda (aksiyon tanıma ve kolektif aktivite tanıma) performansı iyileştirdiğini gostermektedir. Bu tez kapsamında için önerdiğimiz son yöntem ise ilişkisel mantık yürütme yöntemini ve 3B dikkat mekanizmasını içermektedir. 3B dikkat mekanizmasına sahip 3B bir zamanmekansal ilişki ağı öneriyoruz. Bu mimariyi, Zamansal İlişki Ağları (TRN) çalışmasını temel alarak adım adım yeni yöntemler ekleyerek yaratıyoruz. İlk olarak, TRN mimarisine 2B dikkat mekanizması eklendi. Daha sonra 2B mimari 3B uzaya taşındı. Son olarak, 3B TRN mimarisine 3B dikkat mekanizması eklendi. Bu ağları bir aktivite tanıma ve üç kolektif aktivite tanıma veri seti üzerinde değerlendiriyoruz, Bir Şey-Bir Şey-v1, Kolektif Aktivite Tanıma, Kolektif Spor ve Voleybol veri setleri. Sonuçlarımız, dikkat mekanizmasına sahip yöntemlerin tanıma performansını iyileştirdiğini göstermektedir. Ayrıca, 3B ağlara sahip yöntemlerin 2B ağlara göre daha iyi doğruluk değeri elde ettiğini göstermektedir.	tr_TR
dc.contributor.department	Bilgisayar Mühendisliği	tr_TR
dc.embargo.terms	Açık	tr_TR
dc.embargo.lift	2020-10-22T12:40:58Z
dc.funding	TÜBİTAK	tr_TR

Bu öğenin dosyaları:

Ad:: 10303164.pdf
Boyut:: 16.45Mb
Biçim:: PDF

Göster/Aç

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu [212]

Basit öğe kaydını göster