Automatic WordNet Construction Using Wikipedia Data

Haziyev, Farid

dc.contributor.advisor	Ercan, Gönenç
dc.contributor.author	Haziyev, Farid
dc.date.accessioned	2019-10-21T12:29:56Z
dc.date.issued	2019-07-02
dc.date.submitted	2019-06-19
dc.identifier.uri	http://hdl.handle.net/11655/9372
dc.description.abstract	Building WordNets from comparable corpora is a task that is explored, but especially using Wikipedia for this purpose is not explored in depth. Wikipedia, has a structure that makes it a comparable corpora for lots of languages. That is why using this structure, we can ap- ply our methods to resource rich languages and then map the results to the resource poor languages. In this paper, we present one bilingual and two multilingual methods. In our bilingual method Wikipedia’s structure is used for both finding correct synsets and mapping them to the target language. In our multilingual methods we find correct synsets passing in each Wikipage and then map those synsets to the words in the target language using vec- torization. We have grouped 14 languages that have WordNet available for the page names and created Wikipages, where each Wikipage consists of several translations. In order to find the correct synsets in the Wikipages, we used a rule based and a graph based method. After finding correct synsets in each Wikipage, we applied vectorization and mapped those synsets to the words in the translation of the target language Wikipedia. Then we compared our methods with each other and with some state of art methods using German and Russian languages as ground truth. It is seen that our methods show comparable results to the state of art methods. Also, it is shown that when more complex WSD method is used, our results improved.	tr_TR
dc.language.iso	en	tr_TR
dc.publisher	Fen Bilimleri Enstitüsü	tr_TR
dc.rights	info:eu-repo/semantics/openAccess	tr_TR
dc.rights	CC0 1.0 Universal	*
dc.rights.uri	http://creativecommons.org/publicdomain/zero/1.0/	*
dc.subject	WordNet
dc.subject	Word sense disambiguation
dc.subject	Word embeddings
dc.subject	Wikipedia
dc.subject.lcsh	Konu Başlıkları Listesi::Teknoloji. Mühendislik	tr_TR
dc.title	Automatic WordNet Construction Using Wikipedia Data	tr_TR
dc.title.alternative	Vikipedi Verilerini Kullanarak Otomatik Olarak WordNet Oluşturmak	tr_TR
dc.type	info:eu-repo/semantics/masterThesis	tr_TR
dc.description.ozet	Karşılaştırılabilir yapılar kullanarak WordNet oluşturmak yaygın olarak araştırılmaktadır, ancak Vikipedi’yi bu amaçla kullanmak çok fazla araştırılmamaktadır. Vikipedi, birçok dil için karşılaştırılabilir bir yapıya sahiptir. Bu nedenle bu yapıyı kullanarak, yöntemlerimizi zengin kaynaklı dillere uygulayıp, daha sonra diğer dillerle eşleştirebiliriz. Bu projede, bir iki dilli ve iki çok dilli yöntem sunuyoruz. İki dilli yöntemimizde Vikipedi’nin yapısı hem doğru synset’leri bulmak hem de onları hedef dile eşlemek için kullanılır. Çok dilli yöntemlerimizde her Vikipedi sayfasında geçen doğru synset’leri bulup ve daha sonra vektorizasyon kullanarak bu synset’leri hedef dildeki kelimelerle eşleştiriyoruz. Çok dilli yöntemlerimizde, WordNet’i olan 14 dili sayfa adlarına göre gruplandırdık ve birkaç çeviriden oluşan Vikipedi sayfalarını oluşturduk. Vikipedi sayfalarında doğru synset’leri bulmak için kural tabanlı ve grafik tabanlı yöntemler kullandık. Vikipedi sayfalarında doğru synset’leri bulduktan sonra, vektörizasyon kullanarak hedef dildeki kelimelerle eşleştirdik. Daha sonra Almanca ve Rusça zemin gerçeği datalarını kullanarak kendi yöntemlerimizi bir biri ile ve başka state-of-the-art yöntemlerle karşılaştırdık. Sonuç olarak gördük ki bizim yöntemler state-of-art yöntemlere benzer sonuçlar veriyor. Ayrıca daha karmaşık Belirsizlik Giderme yöntemi denendiği zaman sonuçların iyileştiğini gördük.	tr_TR
dc.contributor.department	Bilgisayar Mühendisliği	tr_TR
dc.embargo.terms	Acik erisim	tr_TR
dc.embargo.lift	2019-10-21T12:29:56Z

Files in this item

Name:: 10261721.pdf
Size:: 1.391Mb
Format:: PDF

View/Open

Name:: license_rdf
Size:: 701bytes
Format:: application/rdf+xml

View/Open

This item appears in the following Collection(s)

Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu [253]

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess