Yarı Parametrik Regresyon Modellerinde Tahmin Yöntemlerinin Karşılaştırılması: Üreme Sağlığı Alanında Bir Uygulama
Özet
This thesis aims to predict the number of oocytes obtained after ovulation induction in patients receiving IVF treatment with different prediction methods and to compare their predictive power. For this purpose, Generalized Additive Models (GAM), Multivariate Adaptive Regression Splines (MARS), Regression Tree (RT), Local Polynomial Regression (LPR), Multiple Linear Regression (MLR) models were used to predict the baseline parameters of the patients, and their performances were compared. The 𝑅�2 value of the Multivariate Adaptive Regression Splines (MARS) model was found to be 0,613 and root mean square error (RMSE) value was found to be 3,737 in the training dataset and 𝑅�2 value was found to be 0.476 and RMSE value was found to be 4,492 in the test dataset. These results show that the MARS model performs well in the training dataset, but its performance decreases in the test data set. GAM (Generalized Additive Models) and its variants GAM Poisson and GAM Negative Binomial models exhibited different performance metrics. For the GAM model, the 𝑅�2 value was 0,595 and the RMSE value was 3,825 for the training dataset, while the 𝑅�2 value was 0,477 and the RMSE value was 4,313 for the test dataset. The 𝑅�2 value of the GAM Poisson model in the training data set was 0,680 and the RMSE value was 3,399, while the R2 value was 0,484 and the RMSE value was 4,309 in the test data set. For the GAM Negative Binomial model, 𝑅�2 value was 0,595 and RMSE value was 3,825 in the training dataset, while 𝑅�2 value was 0,498 and RMSE value was 4,239 in the test data set. These results indicate that the GAM Negative Binomial model performs better than the other GAM models in terms of 𝑅�2 in the test data set. The 𝑅�2 value of the regression trees model in the training data set was 0,650 and the RMSE value was 3,554, while the 𝑅�2 value was 0.456 and the RMSE value was 4,390 in the test data set. These results show that the RT model performs well in the training data set, but its performance decreases in the test data set. The MLRmodel has 𝑅�2 value of 0,511 and RMSE value of 4,202 in the training data set and 𝑅�2 value of 0,439 and RMSE value of 4,471 in the test data set. These results show that the MLR model performs worse than the other models. The local polynomial regression model has 𝑅�2 value of 0,704 and RMSE value of 3,327 in the training data set and 𝑅�2 value of 0,521 and RMSE value of 4,244 in the test data set. These results show that the LPR model is the best performing model in terms of 𝑅�2 in both training and test data sets.