Weakly-Supervised Relation Extraction
Özet
Relation extraction is a crucial element for numerous natural language processing applications, including text summarization and question answering. It is noteworthy that there are diverse methodologies for relation extraction, and the majority of them adopt the supervised learning approach, which necessitates a substantial training dataset. These extensive datasets must be hand-labeled by experts, making the annotation process time-consuming and expensive. Another approach that is utilized in this thesis is called weak supervised relation extraction. Using weak supervised learning, the cost of training data labeling can be reduced. In this thesis, we propose a weakly supervised relation extraction approach that is inspired by another weakly supervised model named REPEL. Both in REPEL and our relation extraction approach, extraction patterns are derived from unlabeled texts using given relation seed examples. In order to extract more useful extraction patterns, we introduce the use of labeling functions in our method. These labeling functions consist of simple rules to analyze the candidate pattern’s syntax and these labeling functions help to extract more confident candidate patterns. Our proposed method tests on the same dataset used by REPEL in order to compare our results with the results obtained by REPEL. Tests are conducted both in English and Turkish. Both systems require a number of relation seed examples for learning patterns from the unlabeled data. When fewer relation seed examples are used our method outperforms REPEL significantly. In experimental tests, our approach generally gives better results than REPEL for both languages. For the English test, approximately 15 times more successful than REPEL with few relation seeds. Even with more relation seeds, our approach remains more successful.