Generating Synthetic Data with Game Engines for Deep Learning Applications
Özet
In this thesis, we present a novel method that aims to generate labeled synthetic data for use in training drone detection deep learning models. Modern deep learning methods provide state-of-the-art performance on object detection problems and this performance is increasing thanks to the development of deep learning architectures day by day. The biggest obstacle to this development is the need for large amounts of labeled data for deep learning methods to be successful. Accessible datasets, on the other hand, generally do not contain sufficient data, have incorrect labels and bias. Datasets are traditionally made manually by humans. This human-made process is quite time-consuming and prone to human error. Although it is very difficult to find domain-specific data sets, it is almost impossible to reach data sets for specific problems One of the areas where there is a data shortage is drone detection. Drones are becoming more and more common day by day due to the development of drone technology and decreasing hardware costs. These objects cause security and privacy problems, so the detection of these objects gains great importance. Since drones can be found in any environment due to their nature, a data set with high data amount and variety covering all conditions were needed to train a model that will detect small or hard to visible drones. To solve this data shortage in drone detection, a novel method that can generate synthetic data with game engines is presented. The created method can generate training-ready, labeled images in a randomized manner using two-dimensional backgrounds and three-dimensional drone models. Also, an ablation study was carried out to optimize the synthetic data generated within the scope of this thesis, and several trainings were carried out to compare it with the actual data performance. The developed method can quickly generate synthetic images with photo-realistic labels, and the models trained with these datasets perform performance close to the models trained with real data.