• Türkçe
    • English
  • English 
    • Türkçe
    • English
  • Login
View Item 
  •   DSpace Home
  • Mühendislik Fakültesi
  • Bilgisayar Mühendisliği Bölümü
  • Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu
  • View Item
  •   DSpace Home
  • Mühendislik Fakültesi
  • Bilgisayar Mühendisliği Bölümü
  • Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Dense Video Captioning By Utilizing Auxiliary Image Data

View/Open
Boran, Emre-yeni.pdf (7.136Mb)
Date
2020-07
Author
Boran, Emre
xmlui.dri2xhtml.METS-1.0.item-emb
Acik erisim
xmlui.mirage2.itemSummaryView.MetaData
Show full item record
Abstract
Dense video captioning aims at detecting events in untrimmed videos and generating accurate and coherent caption for each detected event. It is one of the most challenging captioning tasks since generated sentences must form a meaningful and fluent paragraph by considering temporal dependencies and the order between the events, where most of the previous works are heavily dependent on the visual features extracted from the videos. Collecting textual descriptions is an especially costly task for dense video captioning, since each event in the video needs to be annotated separately and a long descriptive paragraph needs to be provided. In this thesis, we investigate a way to mitigate this heavy burden and we propose a new dense video captioning approach that leverages captions of similar images as auxiliary context while generating coherent captions for events in a video. Our model successfully retrieves visually relevant images and combines noun and verb phrases from their captions to generating coherent descriptions. We employ a generator and a discriminator design, together with an attention-based fusion technique, to incorporate image captions as context in the video caption generation process. We choose the best generated caption by a hybrid discriminator that can consider temporal and semantic dependencies between events. The effectiveness of our model is demonstrated on ActivityNet Captions dataset and our proposed approach achieves favorable performance when compared to the strong baseline based on automatic metrics and qualitative evaluations.
URI
http://hdl.handle.net/11655/22724
xmlui.mirage2.itemSummaryView.Collections
  • Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu [162]
Hacettepe Üniversitesi Kütüphaneleri
Açık Erişim Birimi
Beytepe Kütüphanesi | Tel: (90 - 312) 297 6585-117 || Sağlık Bilimleri Kütüphanesi | Tel: (90 - 312) 305 1067
Bizi Takip Edebilirsiniz: Facebook | Twitter | Youtube | Instagram
Web sayfası:www.library.hacettepe.edu.tr | E-posta:openaccess@hacettepe.edu.tr
Sayfanın çıktısını almak için lütfen tıklayınız.
Contact Us | Send Feedback



DSpace software copyright © 2002-2016  DuraSpace
Theme by 
Atmire NV
 

 


DSpace@Hacettepe
huk openaire onayı
by OpenAIRE

About HUAES
Open Access PolicyGuidesSubcriptionsContact

livechat

sherpa/romeo

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsTypeDepartmentPublisherLanguageRightsxmlui.ArtifactBrowser.Navigation.browse_indexFundingxmlui.ArtifactBrowser.Navigation.browse_subtypeThis CollectionBy Issue DateAuthorsTitlesSubjectsTypeDepartmentPublisherLanguageRightsxmlui.ArtifactBrowser.Navigation.browse_indexFundingxmlui.ArtifactBrowser.Navigation.browse_subtype

My Account

LoginRegister

Statistics

View Usage Statistics

DSpace software copyright © 2002-2016  DuraSpace
Theme by 
Atmire NV