Derin Öğrenme-Bazlı Bilgisayarlı Göre Modellerinin Eğitimi ve Değerlendirilmesi için Sentetik Veri Üretimi
Göster/ Aç
Tarih
2021-06-07Yazar
Kerim, Abdulrahman
Ambargo Süresi
Acik erisimÜst veri
Tüm öğe kaydını gösterÖzet
The recent great success witnessed in computer vision field in solving high-level vision tasks such as visual object tracking, semantic segmentation, instance segmentation, and optical flow recognition is predominantly dependent on the availability of large-scale datasets, which are critical for training and testing new algorithms. Manually annotating visual data, however, is not only a time consuming process but also prone to errors and subject to privacy issues. In this work, we present NOVA, a general-purpose framework to create 3D virtual worlds populated with humans that provides pixel-level accurate ground truth annotations for many computer vision tasks. NOVA can simulate several environmental factors such as weather conditions or different times of day, and bring an exceptionally diverse and photo-realistic set of humans to life, each having a distinct appearance and features.
To demonstrate NOVA’s capabilities, we utilized our framework to generate photo-realistic and diverse synthetic sequences for training and testing visual object tracking algorithms. The main motivation was to show that the generated synthetic data, by our rendering engine, constitute a good proxy of its real-world counterpart and it can be deployed to boost the performance of learning based computer vision models. Particularly, our aim was to demonstrate the usability of our generated data for both training and testing computer vision models.
First, we generate two different synthetic datasets for the task of pedestrian tracking. The first of these datasets is utilized to assess the performance of some state-of-the-art visual trackers on various conditions. On the other hand, we employ the second one to train deep visual trackers to improve their performances on real sequences. Our study reveals that the tested trackers perform poorly in highly crowded scenes, or at low illumination and in foggy weather conditions. Additionally, the experiments demonstrate that our generated synthetic sequences indeed present a good proxy of the real sequences and it does improve the performances of deep visual trackers under standard and normal conditions.
Following this, the essential question that emerged and required thorough experiments is the capability of our synthetic data to complement the real-world one and push the limits of current available visual object tracking datasets.
Bearing in mind the poor performance of the recent tracking algorithms at certain challenging conditions (as revealed by our previous experiments), we considered adverse weather conditions in more details. We provided a new person tracking dataset of real-world sequences (PTAW172Real) captured under foggy, rainy and snowy weather conditions to assess the performance of the current trackers. The considered trackers, both correlation filter -based or learning-based, showed a poor performance under these adverse weather conditions. Our experimental results link this deficiency to the lack of enough adverse weather training samples in the current visual object tracking datasets. To mitigate the problem, we extended our rendering engine to further simulate more realistic adverse weather conditions spanning foggy, rainy and snowy weather conditions. Pedestrians in rainy and snowy weathers are simulated with outdoor cold-weather clothes. Snow banks and water
puddles are simulated to account for snow and water accumulations, respectively. Additionally, snow particles and rain drops are generated to match the videos in real life. In parallel to that, snow tracks left by cars and pedestrians are simulated to give more realism. Pedestrians are randomly assigned umbrellas and the suitable animation is set accordingly. At the same time, fog is simulated using post-processing effects and the
Enviro system. The severeness of each of the weather conditions is randomized at run time to give more diversity for the generated sequences.
Following this and harnessing the photo-realism and diversity of the simulated adverse weather condition, we provide a novel person tracking dataset of synthetic sequences (PTAW217Synth) generated by our NOVA framework spanning the same adverse weather conditions. The results demonstrated that the performances of the deep trackers under adverse weather conditions can be improved when our synthetically generated sequences are deployed for training.