ABSTRACT


ESTIMATION OF CUSTOMER SATISFACTION USING MACHINE LEARNING ALGORITHMS OVER MODEM


YASİN SARI


M. Sc Thesis, Department of Statistics Supervisor: Asst. Prof. Dr. İbrahim Zor August 2024, 70 pages


The KNIME Analytics Platform was used throughout all processes, including data transfer, parsing, and algorithm testing. Modem data was analyzed weekly, and download-upload data was categorized and evaluated across six different time slots. For classification analysis, AutoML was utilized, assessing algorithms such as Naive Bayes, Logistic Regression, Neural Networks, Gradient Boosted Trees, Decision Trees, Random Forest, and XGBoost. The libraries and platforms used include H2O software for Generalized Linear Models, the Keras library for Deep Learning, and H2O AutoML for various other algorithms.


The aim of this study is to identify dissatisfied customers. Different sampling methods were used due to working with an unbalanced dataset. Data from modems with faulty signal information and data from subscribers who left a service complaint were used for labeling. The model was improved by reducing the data to four Principal Components using Principal Component Analysis (PCA) and then enriching it with the SMOTE (Synthetic Minority Over-sampling) technique. Tree-based algorithms yielded better


iii

results in solving the classification problem on imbalanced data. Algorithms were evaluated based on the geometric mean of Sensitivity (TPR) and Specificity (TNR), weighted average (WPN), and Bookmaker Informedness (BM) criteria. Due to the closeness of the results, the False Positive (FP) rate was chosen as the final criterion to minimize the investment cost in dissatisfied customers. XGBoost provided the best results among the ten algorithms applied.


Keywords: Machine Learning, Classification, AutoML, Customer Satisfaction Estimation, Big Data


iv