Automatic WordNet Construction Using Wikipedia Data

Haziyev, Farid

View/Open

10261721.pdf (1.391Mb)

Date

2019-07-02

Author

Haziyev, Farid

xmlui.dri2xhtml.METS-1.0.item-emb

Acik erisim

xmlui.mirage2.itemSummaryView.MetaData

Show full item record

Abstract

Building WordNets from comparable corpora is a task that is explored, but especially using Wikipedia for this purpose is not explored in depth. Wikipedia, has a structure that makes it a comparable corpora for lots of languages. That is why using this structure, we can ap- ply our methods to resource rich languages and then map the results to the resource poor languages. In this paper, we present one bilingual and two multilingual methods. In our bilingual method Wikipedia’s structure is used for both finding correct synsets and mapping them to the target language. In our multilingual methods we find correct synsets passing in each Wikipage and then map those synsets to the words in the target language using vec- torization. We have grouped 14 languages that have WordNet available for the page names and created Wikipages, where each Wikipage consists of several translations. In order to find the correct synsets in the Wikipages, we used a rule based and a graph based method. After finding correct synsets in each Wikipage, we applied vectorization and mapped those synsets to the words in the translation of the target language Wikipedia. Then we compared our methods with each other and with some state of art methods using German and Russian languages as ground truth. It is seen that our methods show comparable results to the state of art methods. Also, it is shown that when more complex WSD method is used, our results improved.

URI

http://hdl.handle.net/11655/9372

xmlui.mirage2.itemSummaryView.Collections

Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu [253]

The following license files are associated with this item:

Creative Commons

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess