Predicting Disease-Gene Associations Via Machine Learning
View/ Open
Date
2024Author
Kuzucu, Osman Onur
xmlui.dri2xhtml.METS-1.0.item-emb
Acik erisimxmlui.mirage2.itemSummaryView.MetaData
Show full item recordAbstract
In the quest to elucidate disease etiology and develop advanced diagnostic and treatment
tools, knowing disease-gene relationships is of great importance. Traditional approaches
based on manual curation fall short due to limited scalability and precision. On the other
hand, graph neural networks (GNN) enable the analysis of complex relational data within
biological networks. Although the GNN-based methods developed to date have produced
positive results in predicting unknown biological relationships, there is a current need to
develop new models with high prediction performance and generalisation capabilities for
usability in biology and medicine. In this thesis study, we propose GLADIGATOR (Graph
Learning bAsed DIsease Gene AssociaTiOn pRediction), a deep learning model designed
with the encoder-decoder architecture to predict disease-gene associations. GLADIGATOR
creates a heterogeneous graph that primarily integrates two types of biological components,
genes and diseases, and the connections between them. The model was trained using
gene-gene, disease-disease and gene-disease relationships existing in source biological
databases, as well as protein sequence representations generated by the Prot-T5[1] protein
language model and disease representations generated by the BioBert[2] language model,
as node feature vectors. As the outcome of the analyses conducted, it was observed that
i
GLADIGATOR had superior prediction accuracy. Additionally, the model was positioned
as the highest performer among 14 different disease-gene association prediction methods.
Literature-driven studies on selected predictions have confirmed the biological relevance of
predicted novel associations and highlighted the effectiveness of the GNN-based approach
in identifying potential candidate genes for specific diseases. These results may provide
valuable information for discovering new drugs as a result of future experimental validation
analyses. GLADIGATOR has not only enriched computational approaches developed
for disease-gene association prediction but also emphasised the transformative abilities of
GNNs in biomedical research by potentially accelerating the discovery of new biological
relationships.