Turkish Video Captioning with Msvd-Turkish Dataset
Özet
The problem of video captioning can be defined as describing a video content by using natural language in a way that a person can identify the video by performing information extraction from the given videos. Video captioning problem is a subject of computer vision, but it is also a subject of natural language processing, which makes it a commonality for both domains. Also, the problem is a difficult problem for machines, which achieved considerable success for the English language, but there are no models or results for Turkish. This is because Turkish dataset for video captioning problems was not available recently.
In this thesis, MSVD (Microsoft Research Video Description Corpus) dataset, that has been worked on a lot of the studies about the video captioning problem, has been carefully translated into Turkish language. It has been observed that the translation from English to Turkish with pre-trained translation models such as Google API produces noisy data, and because of this reason training a model with this noisy data couldn't create a dataset for this challenging problem. Therefore, all the data of the MSVD-Turkish dataset is checked and translated into Turkish manually. In this way, a dataset is created for Turkish in parallel with the English MSVD version.
Also, experiments are performed to see the usability of the created MSVD-Turkish dataset. In the experiments conducted, the models used are of the ones that are used to solve the English video captioning problem. Under the nature of the video captioning problem, experiments are carried out using sequence to sequence long-term memories (LSTMs). In addition to this model, other experiments are also done with models that using attention mechanisms added to this sequence to sequence LSTM models. At the same time, the comparison of the performances between all experiments carried out using different segmentation methods on the MSVD-Turkish dataset.
Last of all, with the study carried out, in addition to creating the Turkish video captioning dataset, the first steps were taken for future studies in the field of video captioning. With the experiments, primary achievements have been obtained, and it has been provided to guide the development of new models that are specific to Turkish. With this dataset, which is added to the literature under the name of MSVD-Turkish, it is predicted that further studies specific to the Turkish language will be realized for the video captioning problem and the Turkish performances on this problem are expected to increase.