Developing a Comprehensive Emotion Lexicon for Turkish
Özet
Social media and mood meters on websites have made expressing ideas easy and clear. These communication channels have produced useful data that can be used across disciplines. These data have helped researchers in several ways. Such investigations, also called text analysis, can yield materials from politics to psychology.
Emotion analysis, focuses on the extraction of emotions from textual data. Extraction from texts enables the conduct studies that have interdisciplinary applications. For one, emotional analysis can provide insights into the sentiments and emotions experienced by customers towards a particular product, service, or brand. This data has the potential to enhance customer satisfaction and foster customer loyalty. Emotional analysis also holds potential in fraud detection as it enables the identification of distinctive patterns of emotional language that are closely linked to fraudulent behavior.
When the recent studies on emotion analysis in Turkish are analysed, it is seen that mostly English sources are translated into Turkish. However, it is inevitable that there are structures that cannot be translated and suffixed languages such as Turkish make this structure even more complex. For this purpose, our main hypothesis in this study is that the data used in a language analysis should be in its own language. In this way, an accurate and more reliable analysis will be possible. For this, it was decided to create a emotion analysis data containing many emotions in Turkish.
While this thesis was being prepared, studies were carried out in three main phases. In the first phase, it was aimed to create a data that can be used with emotions in Turkish by using 100 literary works. For this purpose, 213 different emotions in Turkish were searched in the sentences taken from the books and the number of times the words were used with the related emotion was recorded. Thus, a word for each emotion and an emotion-word vector containing the frequency of occurrences of the term with that emotion were created.
In the next phase, it was aimed to cluster these emotions by using clustering algorithms on the data to form groups with each other. For this, firstly, words that do not make sense in the data were removed and the data was scaled. Since the data is very large and sparse on these processes, valuable words were used by putting this data into PCA structure. Then, these data were tested on different clustering algorithms and results were obtained.
Finally, an interface was created to test the behaviour of text input. Afterwards, the distance between the input text and the emotion vectors was calculated with the cosine distance formula to extract emotions from input data. The results of the model created using texts of different lengths and with different emotions were evaluated. A cross-validation study was conducted to compare the emotion assignments from the study and ChatGPT. A consensus was reached by four individuals regarding the experience of at least one emotion among the five identified emotions, thus indicating successful recognition of the predominant emotion conveyed in the text.