Discovery Of Tandem And Interspersed Segmental Duplications Using High-Throughput Sequencing

Soylev, Arda; Le, Thong Minh; Amini, Hajar; Alkan, Can; Hormozdiari, Fereydoun

dc.contributor.author	Soylev, Arda
dc.contributor.author	Le, Thong Minh
dc.contributor.author	Amini, Hajar
dc.contributor.author	Alkan, Can
dc.contributor.author	Hormozdiari, Fereydoun
dc.date.accessioned	2021-06-07T07:30:04Z
dc.date.available	2021-06-07T07:30:04Z
dc.date.issued	2019
dc.identifier.issn	1367-4803
dc.identifier.uri	http://dx.doi.org/10.1093/bioinformatics/btz237
dc.identifier.uri	http://hdl.handle.net/11655/24596
dc.description.abstract	Motivation: Several algorithms have been developed that use high-throughput sequencing technology to characterize structural variations (SVs). Most of the existing approaches focus on detecting relatively simple types of SVs such as insertions, deletions and short inversions. In fact, complex SVs are of crucial importance and several have been associated with genomic disorders. To better understand the contribution of complex SVs to human disease, we need new algorithms to accurately discover and genotype such variants. Additionally, due to similar sequencing signatures, inverted duplications or gene conversion events that include inverted segmental duplications are often characterized as simple inversions, likewise, duplications and gene conversions in direct orientation may be called as simple deletions. Therefore, there is still a need for accurate algorithms to fully characterize complex SVs and thus improve calling accuracy of more simple variants. Results:We developed novel algorithms to accurately characterize tandem, direct and inverted interspersed segmental duplications using short read whole genome sequencing datasets. We integrated these methods to our TARDIS tool, which is now capable of detecting various types of SVs using multiple sequence signatures such as read pair, read depth and split read. We evaluated the prediction performance of our algorithms through several experiments using both simulated and real datasets. In the simulation experiments, using a 30x coverage TARDIS achieved 96% sensitivity with only 4% false discovery rate. For experiments that involve real data, we used two haploid genomes (CHM1 and CHM13) and one human genome (NA12878) from the Illumina Platinum Genomes set. Comparison of our results with orthogonal PacBio call sets from the same genomes revealed higher accuracy for TARDIS than state-of-the-art methods. Furthermore, we showed a surprisingly low false discovery rate of our approach for discovery of tandem, direct and inverted interspersed segmental duplications prediction on CHM1 (<5% for the top 50 predictions).
dc.language.iso	en
dc.relation.isversionof	10.1093/bioinformatics/btz237
dc.rights	Attribution 4.0 United States
dc.rights	info:eu-repo/semantics/openAccess
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.title	Discovery Of Tandem And Interspersed Segmental Duplications Using High-Throughput Sequencing
dc.type	info:eu-repo/semantics/article
dc.type	info:eu-repo/semantics/publishedVersion
dc.relation.journal	Bioinformatics
dc.contributor.department	Bilgisayar Mühendisliği
dc.identifier.volume	35
dc.identifier.issue	20
dc.description.index	WoS

Bu öğenin dosyaları:

Ad:: 258.pdf
Boyut:: 677.1Kb
Biçim:: PDF

Göster/Aç

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Bilgisayar Mühendisliği Bölümü Makale Koleksiyonu [43]

Basit öğe kaydını göster

Aksi belirtilmediği sürece bu öğenin lisansı: Attribution 4.0 United States