Addressing the Data Diversity Gap With Uniquely Generated Synthetic Vıdeos for Real-World Human Action Recognition
Özet
Recognition of human actions using machine learning requires substantial datasets to develop robust models. However, obtaining such real-world data is challenging because it is a costly and time-consuming process. In addition, existing datasets mostly contain indoor videos due to the issues in capturing pose data outdoors. Synthetic data have been employed to overcome these difficulties, yet the currently available synthetic data lack both photorealism and diversity in their features. In this paper, we present the NOVAction engine for generating photorealistic synthetic human action sequences captured from diverse viewpoints to improve action recognition performance. We use NOVAction to create the NOVAction23 dataset comprising 25,415 human action sequences (available at \url{https://graphics.cs.hacettepe.edu.tr/NOVAction}). In NOVAction23, the performed motions and the viewpoints are varied on the fly through procedural generation, so that, for a given animation class, each generated sequence features a unique motion acted by one of the 1,105 synthetic humans captured from a unique viewpoint. We evaluate NOVAction23 by training three state-of-the-art recognizers on it, in addition to the NTU 120 dataset. Our results are further validated through real-world videos from YouTube. The findings confirm that the NOVAction23 dataset can enhance the performance of state-of-the-art video classification for human action recognition.