Turkish Video Captioning with Msvd-Turkish Dataset

Çıtamak, Begüm

dc.contributor.advisor	Erdem, Mehmet Erkut
dc.contributor.author	Çıtamak, Begüm
dc.date.accessioned	2020-09-17T10:47:39Z
dc.date.issued	2020
dc.date.submitted	2020-05-28
dc.identifier.uri	http://hdl.handle.net/11655/22778
dc.description.abstract	The problem of video captioning can be defined as describing a video content by using natural language in a way that a person can identify the video by performing information extraction from the given videos. Video captioning problem is a subject of computer vision, but it is also a subject of natural language processing, which makes it a commonality for both domains. Also, the problem is a difficult problem for machines, which achieved considerable success for the English language, but there are no models or results for Turkish. This is because Turkish dataset for video captioning problems was not available recently. In this thesis, MSVD (Microsoft Research Video Description Corpus) dataset, that has been worked on a lot of the studies about the video captioning problem, has been carefully translated into Turkish language. It has been observed that the translation from English to Turkish with pre-trained translation models such as Google API produces noisy data, and because of this reason training a model with this noisy data couldn't create a dataset for this challenging problem. Therefore, all the data of the MSVD-Turkish dataset is checked and translated into Turkish manually. In this way, a dataset is created for Turkish in parallel with the English MSVD version. Also, experiments are performed to see the usability of the created MSVD-Turkish dataset. In the experiments conducted, the models used are of the ones that are used to solve the English video captioning problem. Under the nature of the video captioning problem, experiments are carried out using sequence to sequence long-term memories (LSTMs). In addition to this model, other experiments are also done with models that using attention mechanisms added to this sequence to sequence LSTM models. At the same time, the comparison of the performances between all experiments carried out using different segmentation methods on the MSVD-Turkish dataset. Last of all, with the study carried out, in addition to creating the Turkish video captioning dataset, the first steps were taken for future studies in the field of video captioning. With the experiments, primary achievements have been obtained, and it has been provided to guide the development of new models that are specific to Turkish. With this dataset, which is added to the literature under the name of MSVD-Turkish, it is predicted that further studies specific to the Turkish language will be realized for the video captioning problem and the Turkish performances on this problem are expected to increase.	tr_TR
dc.language.iso	en	tr_TR
dc.publisher	Fen Bilimleri Enstitüsü	tr_TR
dc.rights	info:eu-repo/semantics/openAccess	tr_TR
dc.subject	Computer vision	tr_TR
dc.subject	Natural language processing	tr_TR
dc.subject	Video captioning	tr_TR
dc.subject	Long-Short term memory (LSTM)	tr_TR
dc.subject	Attention mechanism	tr_TR
dc.subject	Segmentation	tr_TR
dc.subject.lcsh	Bilgisayar mühendisliği	tr_TR
dc.title	Turkish Video Captioning with Msvd-Turkish Dataset	tr_TR
dc.type	info:eu-repo/semantics/masterThesis	tr_TR
dc.description.ozet	Video altyazılama problemi verilen videolardan bilgi çıkarımı gerçekleştirilerek bir insanın görüntüyü tanımlayacağı şekilde doğal bir dil kullanarak videonun açıklanması olarak tanımlanabilir. Günümüzde üzerinde çalışılmakta olan bilgisayarlı görü ve doğal dil işleme alanlarının ortak konularından biri olan bu çözümü zor video altyazılama problemi için İngilizce dili için belirli bir başarıya ulaşabilmişken, Türkçe için henüz başarılı sonuçlar elde edilememiştir. Bunun en temel sebebi olarak video altyazılama problemi için herhangi bir Türkçe veri setinin olmaması gösterilebilir. Bu tez kapsamında, video altyazılama probleminde daha önce çok fazla kullanılmış ve hala çalışmalarda kullanılmakta olan MSVD (Microsoft Research Video Description Corpus) veri kümesinin özenle Türkçe'ye çevirilmesi sağlanmıştır. Türkçe'ye çevirme işleminin Google API gibi hazır İngilizce-Türkçe çeviri modelleri ile gerçekleştirilmesinin kirli bir veri ürettiğini gözlemlenerek bu veri ile bir model eğitilmesinin bu zorlu problem için bir veri kümesi oluşturamayacağına karar verilmiştir. Bu sebeple veri kümesinin tüm verilerinin kontrol edilerek Türkçeye çevirisinin yapılması sağlanmıştır. Bu sayede Türkçe için İngilizce versiyonu ile paralel olacak şekilde bir denektaşı veri kümesi gerçekleştirilmiştir. Aynı zamanda gerçekleştirilen veri kümesinin kullanılabilirliğinin görülebilmesi için deneyler gerçekleştirilmiştir. Gerçekleştirilen deneylerde daha önce İngilizce video altyazılama çalışmalarında kullanılmış olan modeller kullanılmıştır. Bu problemin doğasına uygun olarak sekanstan sekansa Uzun Kısa Süreli Bellek (LSTM)'ler ve bu modele eklenilmiş dikkat mekanizması kullanılarak oluştırılan modeller ile deneyler gerçekleştirilmiştir. Aynı zamanda oluşturulan MSVD-Türkçe veri kümesi üzerinde farklı bölütleme yöntemleri kullanılarak gerçekleştirilen tüm deneyler arasında başarımların karşılaştırılması gerçekleştirilmiştir. Tez kapsamında yapılan çalışmalar sonucunda literatüre türkçe video altyazılama veri kümesinin kazandırılmasının yanı sıra video altyazılama alanında gelecekte gerçekleşecek olan çalışmalar için ilk adımların da atılması sağlanmıştır. Yapılan deneyler ile temel alınabilir başarımlar elde edilerek yeni modeller geliştirilmesi için yol gösterici olması sağlanmıştır. MSVD-Türkçe adı ile literatüre katılan bu veri kümesi sayesinde video altyazılama problemi için Türkçe diline özel yeni çalışmalar gerçekleşeceğine ve bu problem üzerindeki Türkçe başarımların artması öngörülmektedir.	tr_TR
dc.contributor.department	Bilgisayar Mühendisliği	tr_TR
dc.embargo.terms	Acik erisim	tr_TR
dc.embargo.lift	2020-09-17T10:47:39Z
dc.funding	TÜBİTAK	tr_TR

Files in this item

Name:: TURKISH VIDEO CAPTIONING WITH ...
Size:: 27.36Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu [267]

Show simple item record