Comparative Study on Music Source Separation Methods

View/ Open
Date
2022Author
Baysal, Burak
xmlui.dri2xhtml.METS-1.0.item-emb
Acik erisimxmlui.mirage2.itemSummaryView.MetaData
Show full item recordAbstract
Blind source separation is the concept that separates the source signals from the mixture signal. "Blind" means no prior knowledge of the source or the mixing environment. The blind source separation problem is a problem domain that has been studied in the literature for a long time. The most familiar problem example of the domain is the "Cocktail Party Problem." Imagining the party environment and the sound of the environment is to be recorded. The recorded audio signal comprises audio signals such as speech, laughter, music, or even the footstep from the street. Is it possible to extract the source signals, i.e., the audio signal of the music, from this mixture signal? Blind source separation methods aim to obtain the original signals with the least possible loss.
In the beginning, statistics and computational approaches were dominant in the literature. Independent component analysis methods were widely used in blind source separation studies in early studies. Following these approaches, which are based on matrix factorization, methods such as the Degenerate Unmixing Estimation Technique, which contains more complex calculations, have emerged. Recently, machine learning-based approaches have become dominant in the literature, and deep learning methods have begun to be utilized broadly in separating signals.
This thesis aims to comprehensively compare the methods related to the problem domain of blind source separation. In addition to the techniques in the literature for a long time, deep learning-based models employed effectively by today's technologies are also included in the comparative study. Seven different methods of source separation are studied in the thesis. While the classical methods FastICA, NMF, and DUET are included within the scope of the thesis, the machine learning-based models Open Unmix, Spleeter, Wave-U-Net, and Hybrid Demucs have been examined. After providing detailed information about the source separation methods, the experimental study was carried out. The MusDB18-HQ dataset was used during the experiment. Accordingly, an experiment was performed to analyze the audio signals and separate them into four components: vocal, drum, bass, and other. The performance of which method was evaluated with the SDR metric. The evaluation was also made according to music genres and added to the results of the thesis experiment.