dc.contributor.author | Ozturk, Burak | |
dc.contributor.author | Can, Burcu | |
dc.date.accessioned | 2021-06-07T07:30:02Z | |
dc.date.available | 2021-06-07T07:30:02Z | |
dc.date.issued | 2019 | |
dc.identifier.issn | 1300-0632 | |
dc.identifier.uri | http://dx.doi.org/10.3906/elk-1804-10 | |
dc.identifier.uri | http://hdl.handle.net/11655/24590 | |
dc.description.abstract | Turkish is an agglutinative language with rich morphology. A Turkish verb can have thousands of different word forms. Therefore, sparsity becomes an issue in many Turkish natural language processing (NLP) applications. This article presents a model for Turkish lexicon expansion. We aimed to expand the lexicon by using a morphological segmentation system by reversing the segmentation task into a generation task. Our model uses finite-state automata (FSA) to incorporate orthographic features and morphotactic rules. We extracted orthographic features by capturing phonological operations that are applied to words whenever a suffix is added. Each FSA state corresponds to either a stem or a suffix category. Stems are clustered based on their parts-of-speech (i.e. noun, verb, or adjective) and suffixes are clustered based on their allomorphic features. We generated approximately 1 million word forms by using only a few thousand Turkish stems with an accuracy of 82.36%, which will help to reduce the out-of-vocabulary size in other NLP applications. Although our experiments are performed on Turkish language, the same model is also applicable to other agglutinative languages such as Hungarian and Finnish. | |
dc.language.iso | en | |
dc.relation.isversionof | 10.3906/elk-1804-10 | |
dc.rights | Attribution 4.0 United States | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject | finite-state automata | |
dc.subject | lexicon expansion | |
dc.subject | morphological generation | |
dc.subject | Morphology | |
dc.title | Turkish Lexicon Expansion By Using Finite State Automata | |
dc.type | info:eu-repo/semantics/article | |
dc.type | info:eu-repo/semantics/publishedVersion | |
dc.relation.journal | Turkish Journal Of Electrical Engineering And Computer Sciences | |
dc.contributor.department | Bilgisayar Mühendisliği | |
dc.identifier.volume | 27 | |
dc.identifier.issue | 2 | |
dc.description.index | WoS | |
dc.description.index | Scopus | |