Makine Öğrenmesinde Değişken Seçim Yöntemlerinin Karşılaştırılması: Ev Enerjisi Tüketim Tahmini
Göster/ Aç
Tarih
2024-07-01Yazar
Ural, Nuri Berk
Ambargo Süresi
Acik erisimÜst veri
Tüm öğe kaydını gösterÖzet
In today's digital age, the amount of data generated is rapidly increasing due to ever-growing digital activities and technological advancements, paving the way for a new field of study known as "big data." The concept of big data goes beyond traditional data processing techniques not only due to its volume but also because of its variety and velocity. Traditional statistical methods fall short in the face of the complexity and scale of this data. Therefore, within the discipline of data science, it has become inevitable to develop new and more advanced methods and technologies to effectively control, analyze, and transform this massive data flow into valuable insights.
These new methods have also led to significant advancements in fields such as Machine Learning and Artificial Intelligence, making data interpretation processes more efficient and effective. This evolution has positioned data science not merely as an academic curiosity but as a critical component in strategic decision-making processes in business, healthcare, finance, and many other sectors. Along with these developments, the model-building process has also become much more complex. At this point, the importance of variable selection to enhance model prediction performance and achieve meaningful results becomes evident. Incorrect variable selection can negatively impact the model's prediction performance and lead to misleading results.
Variable selection is a critical step in obtaining meaningful and accurate results from large datasets. The selection of incorrect or irrelevant variables can severely degrade the overall predictive ability of the model, lead to misleading outcomes, and result in wrong decisions. Therefore, the use of advanced selection techniques and algorithms to identify the correct variables is of vital importance in data science practices. These techniques help manage model complexity, protect against overfitting, and most importantly, improve prediction performance. In particular, accurate variable selection in Machine Learning (ML) and Artificial Intelligence (AI) models can enhance the model's generalization capacity on real-world data, leading to more reliable and high-accuracy results.
This study examines the role of variable selection methods in prediction performance for energy consumption forecasting. Within this scope, the effectiveness of variable selection methods and the performance of models created using these methods are compared using various Machine Learning algorithms. The dataset used in the study is designed to predict the energy consumption of household appliances. This dataset includes temperature and humidity measurements taken every 10 minutes for 4.5 months by sensors placed in various rooms and outside the house. It consists of a total of 19,735 observations and 28 variables, with no missing or incomplete observations.
The primary objective of the study is to evaluate the detailed effects of variable selection methods on the prediction performance of machine learning algorithms. Within this scope, methods such as Correlation-Based Feature Selection (CFS), Variance-Based Selection, Forward Selection, Backward Elimination, Stepwise Selection, Genetic Algorithms-Based Selection, Lasso Regression-Based Selection, Ridge Regression-Based Selection and Robust Feature Selection Method were used. After each variable selection method, models were created using Linear Regression, Decision Trees, Random Forests, Support Vector Machines, Principal Component Analysis, and Neural Networks algorithms, and the performance of these models was evaluated using Mean Absolute Error (MAE), Mean Squared Error (MSE), and R² metrics.
The results of the study present a comparative analysis of the impact of different variable selection methods and machine learning algorithms on the performance of energy consumption prediction, contributing to the literature by drawing parallels with other studies in this field. Significant findings were obtained regarding which variable selection method and machine learning algorithm are most suitable for energy consumption forecasting. These findings provide guidance for data scientists and researchers in selecting appropriate methods and algorithms for their datasets. The study aims to contribute to the advancement of knowledge, research, and application methodologies in the dynamic discipline of data science.