Fake Detection and Analysis in Tweets With Machine Learning Algorithms
Özet
The widespread use and application of social media have raised concerns about whether the shared data contains accurate information or is misleading. Fake detection applications are used to detect fake Twitter(X) posts and fake news texts related to politics, catastrophic or bad events. The basis of fake detection algorithms is the process of separating the text into two classes with the model trained with the training set. In this thesis study, we first apply six different machine learning algorithms to six different datasets to compare their performance in the fake detection classification task. The performance results of these algorithms are examined in detail in terms of datasets and algorithms. Some of the datasets are English and one of them Turkish. The datasets cover a variety of fields and topics, including COVID-19, politics, the economy, earthquakes, and hurricanes. To evaluate the impact of text length, datasets containing both short tweets and longer news articles are chosen. Secondly, an analysis is conducted to determine the outcomes of using different datasets as training and testing datasets and to identify which datasets performed well when combined. Afterwards, the effects of the similarity or difference of the datasets on the results are analyzed by examining the results obtained when the train dataset obtained by combining different datasets and the test dataset is one different dataset. Finally, a Long-Short Term Memory (LSTM) model is developed based on studies involving the layers and hyperparameters used in the LSTM algorithm.