Audio Classification with Few-Shot Learning
Date
2024-09-24Author
Çiğdem, Enes Furkan
xmlui.dri2xhtml.METS-1.0.item-emb
Acik erisimxmlui.mirage2.itemSummaryView.MetaData
Show full item recordAbstract
This thesis does a full experimental study of the few-shot classification problem in the audio domain to compare how well episodic and non-episodic training methods work.Three different optimization algorithms are trained with the non-episodic method, and the effect of the training techniques on the classification performance is investigated. In making these comparisons, simple feature transformations have been employed to improve performance, and their effect on performance has been analyzed.
The few-shot audio classification task has been conducted in scenarios with limited data. This study uses two distinct data sets: Environmental Sound Classification - 50 and Google Speech Commands. ESC-50 includes environmental non-speech noises. GSC encompasses basic spoken orders. Three distinct scenarios are constructed in which the amount of training data is constrained for each data set by selecting 5, 10, and 15 samples per class. A series of comprehensive experiments have been conducted with these different training sets using three different optimization models in non-episodic experiments: single-stage hybrid loss optimization (SSHLO), single-stage loss optimization (SSLO), and two-stage loss optimization (TSLO). The results of these experiments are then compared between the three optimizations and episodic training.
The findings of our research point out that the non-episodic training approach is more effective than the episodic training approach in the audio domain when used with a pre-trained model.
In terms of optimizations, the results demonstrate that single-stage hybrid loss optimization (SSHLO) is the most superior optimization on the two data sets.