Developing A Turkish Sentiment Lexicon Using Tone Distributions
Karaşlar, Muazzez Şule
xmlui.mirage2.itemSummaryView.MetaDataShow full item record
With the developing technology and increasing use of the internet, many sources of data have been exposed to researchers. Analysis and extraction of meaningful information from this data is a research topic under the field of natural language processing. Sentiment analysis which is a sub-field of NLP evaluates the content of data with respect to the opinion it conveys as one of positive or negative. Most sentiment analysis research is done using one of two approaches: lexicon based and machine learning based. Lexicon based approach needs a dictionary of positive and negative words which are used to evaluate a text. Although there are abundance of studies in English, the same can not be claimed for Turkish. Therefore, in our study, we focus on constructing a comprehensive and accurate Turkish sentiment lexicon. In this paper, we aim to develop a Turkish sentiment lexicon with a novel methodology: using statistical tone density functions computed using a very large document corpus obtained from mainstream Turkish news agencies. In this way, for the first time in the literature, a Turkish sentiment lexicon is created by using this method. The lexicon not only assigns tone values instead of boolean polarities, but also provides sharper tones which is usually not possible with other approaches in the literature. We evaluate the performance of this lexicon in comparison with similar lexicons in the literature. Results show that the constructed sentiment lexicon in this study achieves a comparable performance and poses many potential improvement possibilities.