Comparative Study on Music Source Separation Methods

Baysal, Burak

dc.contributor.advisor	Efe, Mehmet Önder
dc.contributor.author	Baysal, Burak
dc.date.accessioned	2023-05-22T10:19:19Z
dc.date.issued	2022
dc.date.submitted	2022-12-27
dc.identifier.uri	https://hdl.handle.net/11655/33216
dc.description.abstract	Blind source separation is the concept that separates the source signals from the mixture signal. "Blind" means no prior knowledge of the source or the mixing environment. The blind source separation problem is a problem domain that has been studied in the literature for a long time. The most familiar problem example of the domain is the "Cocktail Party Problem." Imagining the party environment and the sound of the environment is to be recorded. The recorded audio signal comprises audio signals such as speech, laughter, music, or even the footstep from the street. Is it possible to extract the source signals, i.e., the audio signal of the music, from this mixture signal? Blind source separation methods aim to obtain the original signals with the least possible loss. In the beginning, statistics and computational approaches were dominant in the literature. Independent component analysis methods were widely used in blind source separation studies in early studies. Following these approaches, which are based on matrix factorization, methods such as the Degenerate Unmixing Estimation Technique, which contains more complex calculations, have emerged. Recently, machine learning-based approaches have become dominant in the literature, and deep learning methods have begun to be utilized broadly in separating signals. This thesis aims to comprehensively compare the methods related to the problem domain of blind source separation. In addition to the techniques in the literature for a long time, deep learning-based models employed effectively by today's technologies are also included in the comparative study. Seven different methods of source separation are studied in the thesis. While the classical methods FastICA, NMF, and DUET are included within the scope of the thesis, the machine learning-based models Open Unmix, Spleeter, Wave-U-Net, and Hybrid Demucs have been examined. After providing detailed information about the source separation methods, the experimental study was carried out. The MusDB18-HQ dataset was used during the experiment. Accordingly, an experiment was performed to analyze the audio signals and separate them into four components: vocal, drum, bass, and other. The performance of which method was evaluated with the SDR metric. The evaluation was also made according to music genres and added to the results of the thesis experiment.	tr_TR
dc.language.iso	en	tr_TR
dc.publisher	Fen Bilimleri Enstitüsü	tr_TR
dc.rights	info:eu-repo/semantics/openAccess	tr_TR
dc.subject	Blind source separation	tr_TR
dc.subject	Music source separation	tr_TR
dc.subject	Music information retrieval	tr_TR
dc.title	Comparative Study on Music Source Separation Methods	tr_TR
dc.type	info:eu-repo/semantics/masterThesis	tr_TR
dc.description.ozet	Kör kaynak ayrıştırma problemi uzun zamandır literatürde üzerine çalışmalar yapılan bir problem alanıdır. Problem alanına dair bilinen en yaygın örnek ise "Kokteyl Parti Problemi"'dir. Problemin tanımında bir parti ortamından bahseder ve ortamın sesi kaydedilecek olunursa, kaydedilen bu ses sinyali konuşma, kahkaha ve müzik v.b ses sinyallerinin bir karışımı olacaktır. Peki bu karışım sinyalinden kaynak sinyalleri yani örneğin müziğe ait ses sinyalini çıkartmak mümkün müdür? Kör kaynak ayırma metotları, karışım sinyalinden orijinal sinyallerin mümkün olan en az kayıpla elde edilmesini amaçlar. Başlarda literatürde istatistik ve hesaplama temelli yaklaşımlar hakimdi. Bağımsız bileşen analizi metotlar ilk zamanlarda kör kaynak ayrıştırma çalışmalarında çokça kullanılmaktaydı. Matris faktorizasyonunu temel alan bu yaklaşımların ardından daha karmaşık hesaplamaları içeren Dejenere Ayrıştırma Tahmin Tekniği gibi yöntemler ortaya çıkmıştır. Son zamanlarda ise literatürde makine öğrenmesi temelli yaklaşımlar baskın hale gelmiş ve derin öğrenme metotları sinyalleri ayrıştırmada yoğun halde kullanılır olmaya başlamıştır. Bu tez çalışmasıyla kör kaynak ayırma problem alanına dair metotların kapsamlı bir karşılaştırması amaçlanmıştır. Literatürde uzun zamandır yer alan metotların yanı sıra günümüz teknolojilerinin etkin kullandığı derin öğrenme temelli modeller de karşılaştırmalı çalışmaya dahil edilmiştir. Kaynak ayrıştırmaya dair yedi farklı metot tez kapsamında çalışmaya dahil edilmiştir. Klasik metotlardan FastICA, NMF ve DUET tez kapsamında çalışırken, makine öğrenmesi temelli metotlardan da Open Unmix, Spleeter, Wave-U-Net ve Hybrid Demucs ile modelleri incelenmiştir. Kaynak ayrıştırma metotlarına dair detaylı bilgi sağladıktan sonra deneysel çalışma gerçekleştirilmiştir. Bu doğrultuda ses sinyallerinin analiz edilerek vokal, davul, bas ve diğer olmak üzere dört farklı bileşene ayrıştırılması deneyinde hangi metodun nasıl performans gösterdiği SDR metriği ile değerlendirilmiştir. Aynı zamanda mizük türlerine göre de değerlendirme yapılarak tez deney sonuçlarına eklenmiştir.	tr_TR
dc.contributor.department	Bilgisayar Mühendisliği	tr_TR
dc.embargo.terms	Acik erisim	tr_TR
dc.embargo.lift	2023-05-22T10:19:19Z
dc.funding	Yok	tr_TR

Bu öğenin dosyaları:

Ad:: Thesis.pdf
Boyut:: 2.225Mb
Biçim:: PDF
Açıklama:: BurakBaysalThesis

Göster/Aç

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu [212]

Basit öğe kaydını göster