Nadir Metabolik Hastalıklarda Tüm Ekzom Dizileme Verilerinin Biyoinformatik Analizleri ile Fenotipten Sorumlu Varyantların Değerlendirilmesi
Özet
Kosukcu, C. Bioinformatics Analysis and Variant Interpratation of Whole Exome Sequencing Data in Inborn Errors of Metabolism. Hacettepe University Graduate School of Health Sciences, Ph.D. Thesis in Molecular Metabolism, Ankara 2024. Inborn errors of metabolism are pathophysiologically examined in three groups, as intoxication type (amino acid metabolism disorders, galactosemia, etc.), energy metabolism disorders (mitochondrial diseases) and macromolecular diseases (organelle dysfunctions). In intoxication type metabolic diseases, there is a toxic effect that develops with the accumulation of the substrate located proximal to the enzyme reaction due to enzyme deficiency, while energy metabolism disorders develop as a result of enzyme deficiencies involved in the synthesis of the ATP molecule (PDH deficiency, Krebs cycle enzyme deficiencies and respiratory chain dysfunction, etc.). Complex molecule diseases generally develop as a result of enzyme deficiencies involved in the lysosome, peroxisome, endoplasmic reticulum-Golgi apparatus mechanism outside the mitochondria. Lysosomal storage diseases, peroxisomal diseases and hereditary glycation disorders are classic examples of this group of diseases. However, as a result of the studies carried out by research groups in which we are a part, new disease groups such as vesicular traffic disorders and autophagy dysfunctions have begun to be defined. As a clinical phenotype, hereditary metabolic diseases manifest themselves with single or multiple organ/system involvement, depending on the disease type. The central nervous system is one of the systems most frequently involved in metabolic diseases (>55%). Therefore, early diagnosis and treatment of hereditary metabolic diseases is very important to prevent mortality and morbidity in these diseases. In recent years, the use of advanced genetic analysis methods has become the main approach in the diagnosis of rare or very rare metabolic diseases that cannot be diagnosed by traditional methods. In particular, the whole exome sequence analysis method, which allows the analysis of all coding gene regions, has become the most important tool in the identification of new disease genes or in the molecular diagnosis of metabolic/neurometabolic diseases that show genetic heterogeneity. Exome analyzes have also accelerated the discovery of new candidate genes. If the genes detected by exome analyzes are not associated with any clinical phenotype, new candidate genes emerge. This process, whether through whole genome or whole exome analysis, has placed a huge pile of information that needs to be interpreted in front of researchers and increasingly clinicians in medical practice. The process of finding a single candidate gene responsible for a disease from this mass of information has created the field of bioinformatics, which is already the most fundamental field of study in medical practice. Asking the most appropriate questions to the huge pile of information in the context of the phenotype and evaluating the answers in the most appropriate way constitute the basic process of bioinformatics. Therefore, it is essential that bioinformatic analysis be accompanied by in-depth clinically relevant phenotype information. In addition, prediction of pathogenicity, copy number analysis, protein modeling, pathway analysis, etc. In-silico analysis methods such as these guide the researcher performing bioinformatics analysis in the process of identifying a single candidate gene responsible for the disease. Bioinformatics analysis is the most critical step in the processing of raw data, identifying candidate genes, detecting and filtering genetic variations, and detecting pathogenic mutations. With advanced bioinformatics analysis of the data, different mutation types (missense, nonsense, truncation, small INDELs) can be detected, while large copy number changes can also be determined. In this doctoral thesis, multiple software tools were utilized for the bioinformatic analysis of Whole Exome Sequencing (WES) data. BWA (Burrows-Wheeler Aligner) was used for aligning FASTQ data, SAMtools for filtering repetitive sequences, BEDtools for calculating the read depth of exonic regions, and GATK for the variant calling steps. CLC Genomics Server 24.0.1 was employed for reanalysis of raw data and detection of copy number variations (CNVs). FoldX software was used for repairing PDB structures prior to modeling missense mutations using PDB files. In the in-silico protein modeling, four different software tools (DynaMut2, PremPS, INPS-3D, and FoldX) were used to calculate ΔΔG values. Pathway analyses reflecting the interactions between the identified genes and newly discovered genes that have not been reported in the literature were conducted using the STRING 12.0 software. In this study, raw data analyses of Whole Exome Sequencing from 213 individuals across 162 families were performed using bioinformatic methods and the results were interpreted. A definitive diagnosis was made for 155 cases, and the total number of identified variants was calculated as 170. Among the cases with a definitive diagnosis, 103 had missense mutations (61%), 18 had frameshift mutations (10%), 17 had nonsense mutations (10%), 17 had splice site mutations (10%), 6 had copy number variations (4%), 5 had start codon mutations (3%), and 4 had in-frame deletions or insertions (2%). The 103 identified missense mutations were modeled on the protein structure using four different methods, and in-silico predictions were conducted to illustrate the structural changes induced by the mutations on the protein structure. The identified variants were subjected to pathway analysis at the gene level to detect relationships among the molecular pathways involved in rare inherited metabolic diseases and to explore the positioning of newly identified genes within these pathways. In this thesis study, WES analysis was performed on 213 individuals from a total of 162 families, resulting in molecular diagnosis for 155 patients, while 58 cases remained undiagnosed. Accordingly, the success rate of molecular diagnosis in the bioinformatic analyses was calculated as 73%. Although, the laboratory methods and sequencing techniques used to obtain Whole Exome Sequencing data are highly developed, accurate and comprehensive bioinformatics analyzes are extremely important in clinical interpretation.
Keywords: Inborn errors of metabolism, Next Generation Sequencing, Whole Exome Sequencing, bioinformatics, protein modelling, Copy Number Variation, deep phenotyping