Haber Metinlerinin Kategorizasyonunda Varlık Isimleri ve Konu Başlıkları Ilişkisi
Özet
With text categorization it is possible to access information within a large pile of impure data. It also helps people to save time who wants to have information more easily and practically. One of the most important practical research areas in terms of text categorization is news, as it has a potential of rapid increase. This thesis aims to investigate the connection of subject codes with named entities, in terms of text categorization, by using 5834 news texts which were obtained from BilCol-2005 news corpus. To address this, 5834 news were tagged with seven different named entities (person, organization, location, date, time, money and percentage). Tagged news were classified under 13 different subject codes of IPTC s (International Press Telecommunications Council) main subject taxonomies. The investigation was based on tagged and untagged words and their relations with the IPTC news codes. Key findings were revealed with the frequency and percentage values along with some stati