A rapid heuristic algorithm to solve the single individual haplotype assembly problem

Document Type : Research Article

Authors

1 Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran

2 Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran Computer Science Department, University of Copenhagen: Copenhagen, Denmark

Abstract

The Haplotype Assembly is the computational process in which two distinct nucleotide sequences of chromosomes are reconstructed using the sequencing reads of an individual. The ability to identify haplotypes provides many benefits for future genomic-based studies to be conducted in many areas, such as drug design, population study, and disease diagnosis. Even though several approaches have been put out to achieve highly accurate haplotypes, the problem of quick and precise haplotype assembly remains a challenging task. Due to the enormous bulk of the high-throughput sequencing data, algorithm speed plays a crucial role in the possibility of haplotype assembly in the human genome dimension. This study introduces a heuristic technique that enables rapid haplotype reconstruction while maintaining respectable accuracy. Our approach is divided into two parts. In the first, a partial haplotype is created and enlarged over a number of iterations. We have employed a novel metric to assess the reconstructed haplotype's quality in each iteration to arrive at the optimal answer. The second stage of the algorithm involves refining the reconstructed haplotypes to increase their accuracy. The outcome reveals that the suggested approach is capable of reconstructing the haplotypes with an acceptable level of accuracy. In terms of speed, the performance of the algorithm surpasses the competing approaches, especially in the case of high-coverage sequencing data.

Keywords

Main Subjects