Retweet Prediction on Earthquake Tweets

İnce, Sevginur

dc.contributor.advisor	Sezer, Ebru
dc.contributor.author	İnce, Sevginur
dc.date.accessioned	2024-10-14T12:32:31Z
dc.date.issued	2024-09-14
dc.date.submitted	2024-09-02
dc.identifier.citation	İnce, S., (2024), Retweet Prediction on Earthquake Tweets, [Master's Thesis, Hacettepe University}. YÖK ulusal tez merkezi https://tez.yok.gov.tr/UlusalTezMerkezi/	tr_TR
dc.identifier.uri	https://hdl.handle.net/11655/35917
dc.description.abstract	On February 6, 2023, an earthquake centered in Kahramanmaraş killed or damaged many people. In the aftermath of these devastating earthquakes, the efficiency of communication channels in the crisis zone is of vital importance. While a few decades ago there was no indication of the existence of social media, today social media platforms have become people's main communication channels. Twitter, one of these platforms, is widely used in Turkey. Social media provides the opportunity to reach millions of people with a shared post. The amount of interaction a post receives increases the possibility of being noticed by other users on social media. In this thesis, using the tweets posted during and after the earthquake centered in Kahramanmaraş on February 6, 2023, the retweet interaction amounts were divided into two classes. These classes are 'non-low' and 'moderate-high' classes. The data was captured with Python's Snscrape Library as 38 days of data covering February 6, 2023 - March 15, 2023. The following operations were then performed respectively: Tweet text was cleaned. Spelling mistakes were corrected with the python Zemberek Module. Words were parse to their roots with Zeyrek Module. Stop words were deleted. Stop words were deleted. The dataset was simplified and IDF values of unique words in the first week tweets were calculated. Unique words were grouped according to their IDF value ranges. By adding 400 unique words from different IDF ranges to the dataset, 7 dataset versions consisting of different unique word groups were obtained. Among these sets, the word set that best represents the tweet text was investigated. The XGBoost model was used in the analysis. We also investigated the interaction type and class threshold limit that would be the best class label. The best class label was 'Retweet' and the best class distinction limit was observed as 2. The words that best represents the dataset were found to be the 400 words with the lowest IDF value. These words were added to the dataset as Binary Bag of Words. Then, classification was performed with various Deep Learning and Machine Learning models. These models are Random Forest, XGBoost, LSTM and DistilBERTurk. The XGBoost model gave the best performance. The results of the XGBoost model are as follows: Non-low class precision 0.75, recall 0.70, F1 score 0.73, Moderate-high class precision 0.72, recall 0.77, F1 score 0.74. Average accuracy 0.7340 and ROC-AUC score 0.81.	tr_TR
dc.language.iso	en	tr_TR
dc.publisher	Fen Bilimleri Enstitüsü	tr_TR
dc.rights	info:eu-repo/semantics/openAccess	tr_TR
dc.subject	Text classification	tr_TR
dc.subject	Machine learning
dc.subject	Natural language processing
dc.subject	XGBoost,
dc.subject	LLM
dc.subject	Random Forest
dc.subject	LSTM
dc.subject.lcsh	Bilgisayar mühendisliği	tr_TR
dc.title	Retweet Prediction on Earthquake Tweets	tr_TR
dc.type	info:eu-repo/semantics/masterThesis	tr_TR
dc.description.ozet	6 Şubat 2023'te Kahramanmaraş merkezli depremde çok sayıda kişi hayatını kaybetti veya zarar gördü. Bu yıkıcı depremlerin ardından kriz bölgesindeki iletişim kanallarının etkinliği hayati önem taşımaktadır. Birkaç on yıl önce sosyal medya varlığına dair hiçbir belirti yokken, bugün sosyal medya platformları insanların temel iletişim kanalları haline geldi. Bu platformlardan biri olan Twitter Türkiye'de de yaygın olarak kullanılmaktadır. Sosyal medya paylaşılan bir gönderi ile milyonlarca insana ulaşma imkânı sağlamaktadır. Bir gönderinin aldığı etkileşim miktarı sosyal medyada diğer kullanıcılar tarafından fark edilme ihtimalini arttırır. Bu tezde, 6 Şubat 2023'te Kahramanmaraş merkezli deprem sırasında ve sonrasında atılan tweetler kullanılarak retweet etkileşim miktarları iki sınıfa ayrılmıştır. Bu sınıfar 'düşük olmayan' ve 'orta yüksek’ sınıflarıdır. Veriler Python'un Snscrape Kütüphanesi ile 6 Şubat 2023 - 15 Mart 2023 tarihlerini kapsayan 38 günlük veriler olarak ele geçirildi. Daha sonra sırasıyla şu işlemler gerçekleştirildi: Tweet metni temizlendi. Yazım yanlışları Python Zemberek Modülü ile düzeltildi. Zeyrek Modülü ile kelimeler köklerine ayrıldı. Duraksama kelimeleri silindi. Veri seti basitleştirildi ve ilk hafta verilerinin IDF değerleri hesaplandı. Eşsiz kelimelerin IDF değerleri hesaplandı. IDF değer aralıklarına göre eşsiz kelimeler gruplandı. Farklı IDF aralıklarından 400’er eşsiz kelime veri setine eklenerek farklı eşsiz kelime gruplarından oluşan 7 dataset versiyonu elde edildi. Bu setlerin içinden tweet metnini en iyi temsil eden kelime seti araştırıldı. Analizlerde XGBoost modeli kullanıldı. Ayrıca en iyi sınıf etiketi olacak etkileşim tipi ve sınıf eşik sınırı da araştırıldı. En iyi sınıf etiketi 'Retweet', en iyi sınıf ayrım sınırı ise 2 olarak gözlendi. Data seti en iyi temsil eden kelimelerin IDF değeri en düşük olan 400 kelime olduğu belirlendi. Bu kelimeler veri setine Binary Bag of Words olarak eklenmiştir. Ardından çeşitli Deep Learning ve Machine Learning modelleri ile sınıfama gerçekleştirildi. Bu modeller Random Forest, XGBoost, LSTM ve DistilBERTurk'tür. XGBoost modeli en iyi performansı verdi. XGBoost modeli sonuçları aşağıdaki gibidir: Düşük olmayan sınıf hassasiyeti 0.75, geri çağırma 0.70, F1 puanı 0.73, orta-yüksek sınıf hassasiyeti 0.72, geri çağırma 0.77, F1 puanı 0.74. Ortalama doğruluk 0.7340 ve ROC-AUC puanı 0.81.	tr_TR
dc.contributor.department	Bilgisayar Mühendisliği	tr_TR
dc.embargo.terms	Acik erisim	tr_TR
dc.embargo.lift	2024-10-14T12:32:31Z
dc.funding	Yok	tr_TR
dc.subtype	workingPaper	tr_TR

Bu öğenin dosyaları:

Ad:: Sevginur İnce Yüksek Lisans ...
Boyut:: 2.810Mb
Biçim:: PDF

Göster/Aç

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu [267]

Basit öğe kaydını göster