Image, Sequence, and Interactome Based Prediction of Subcellular Localization of Proteins
Özet
Knowledge of subcellular localization (SL) of proteins is essential for drug
development, systems biology, proteomics, and functional genomics. Due to the high
costs associated with experimental studies, it has become crucial to develop
computational systems to accurately predict proteins’ SLs. With different modes of
biological data (e.g., biomolecular sequences, biomedical images, unstructured text,
etc.) becoming readily available to ordinary scientists, it is possible to leverage
complementary types of data to increase both the performance and coverage of
predictions. In this study, we propose HoliLoc, a new method for predicting protein
SLs via multi-modal deep learning. Our approach makes use of three different types
of data (i.e., 2D confocal microscopy images, amino acid sequences, and protein protein interactions – PPIs) to predict SLs of proteins in a multi-label manner for 22
different cell compartments using protein language models, graph embeddings and
convolutional and feed forward neural networks. The system was trained in an end-to end manner, and the performances were calculated on the unseen hold-out test dataset.
The average test performance of individual models (each using a single data type) was
0.18 (macro F1-score) and 0.55 (accuracy), whereas for HoliLoc (the fusion of 3
modalities) it was observed to be 0.26 (F1-score) and 0.60 (accuracy), indicating the
effectiveness of the multi-modal learning approach proposed. According to our
comparison against state-of-the-art SL predictors, HoliLoc displays highly competitive
performance. HoliLoc is distributed as an open-access programmatic tool, which is
anticipated to benefit life science researchers by reducing the cost and time required
for wet-lab experiments by accurately predicting the SLs of the protein of interest in
advance.
Bağlantı
https://hdl.handle.net/11655/34562Koleksiyonlar
- Biyoinformatik [12]