Makine Öğrenme Yöntemleri Ve Kelime Kümesi Tekniği İle İstenmeyen E-Posta / E-Posta Sınıflaması
Özet
Nowadays, we frequently use e-mails, which is one of the communication channels, in electronic environment. It plays an important role in our lives because of many reasons such as personal communications, business-focused activities, marketing, advertising, education, etc. E-mails make life easier because of meeting many different types of communication needs. On the other hand they can make life difficult when they are used outside of their purposes. Spam emails can be not only annoying receivers, but also dangerous for receiver’s information security. Detecting and preventing spam e-mails has been a separate issue.
In this thesis, spam e-mails have been studied comprehensively and studies which is related to classifying spam e-mails have been investigated. Unlike the studies in the literature, in this study; the texts of the links placed in the e-mail body are handled and classified by the machine learning methods and the Bag of Words Technique. In this study, we analyzed the effect of different N grams on classification performance and the success of different machine learning techniques in classifying spam e-mail by using accuracy, F1 score and classification error metrics. On the other hand, the effect of different N grams is examined for machine learning success rate of over %95. As a result of the study, it has been seen that Decision Trees Algorithms show low success in spam classification when Bayes, Support Vector Machines, Neural Networks and Nearest Neighbor Algorithms show high success. On the other hand, 5 grams were found to provide the best contribution for performance.