Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences

Document Type : Research Article

Authors

1 Corresponding Author, Reza Akbari is with the Department of Computer Engineering and Information Technology, Shiraz University of Technology, Shiraz, Iran, (email: akbari@sutech.ac.ir)

2 Vahid Zeighami is with the Department of Mathematics and Industrial Engineering, Ecole Polytechnique, de Montreal, Montreal, Quebec, Canada, (email: vahid.zeighami@polymtl.ca)

3 Koorush Ziarati is with the Department of Computer Science and Engineering, Shiraz University, Shiraz, Iran, (email: ziarati@shirazu.ac.ir)

4 Ismail Akbari is a graduated student from Department of Industrial Engineering, Iran University of Science of Science and Technology, Tehran, Iran, (email: ismail.akbari80@gmail.com)

Abstract

This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed by the basic PSO algorithm. Under this method, to balance between exploration and exploitation, at each iteration step, a local region is associated with each candidate particle, and a local exploration performed in this blob. The stochastic local search employs an intelligent repulsion/attraction mechanism to navigate a particle to explore this local region beyond that defined by the search algorithm to achieve a better solution. Over the successive iterations, the size of local region dynamically decreases. Also a non-linear dynamic inertia weight is introduced to further improve the performance of SPSO-Lk approach. The SPSO-Lk is tested on different sets of simulated and real nucleotide sequences to discover implanted DNA motifs. Experimental results show that the SPSO-Lk is effective, and provides competitive results in comparison with the performance of other algorithms investigated in this consideration. 

Keywords


[1]    M.-F. Sagot. "Spelling approximate repeated or common motifs using a suffix tree", In C. L. Lucchesi and A. V. Moura, editors, LATIN’98: Theoretical Informatics, Lecture Notes in Computer Science, pp. 111–127. Springer-Verlag, 1998.
[2]    G. Pavesi, G. Mauri, G. Pesole, "An algorithm for finding signals of unknown length in DNA sequences", Bioinformatics;17 Suppl 1:S207-14, 2001.
[3]    E. Eskin, U. Keich, M. S. Gelfand, P. A. Pevzner. “Genome-Wide Analysis of Bacterial Promoter Regions.” In Proc. of the Pacific Symposium on Biocomputing (PSB-2003). Kaua’i, Hawaii: January 3-7, 2003.
[4]    G. Z. Hertz, and G. D. Stormo, “Identifying DNA and protein patterns with statistically significant alignments of multiple sequences”, Bioinformatics. 15(7-8):563-77,1999.
[5]    T. L. Bailey, and C. Elkan, "The value of prior knowledge in discovering motifs with MEME", Proc. Int. Conf. Intell. Syst. Mol. Biol., 3, 21–29, 1995.
[6]    C. E. Rouchka, A brief overview of Gibbs sampling." [online] Available:http://kbrin.a-bldg.louisville.edu/rouchka/HOMEPAGE/PAPERS/gibbs.pdf.
[7]    N. Karaoglu, S. M. Stroh, and B. Manderick, "GAMOT: An Efficient Genetic Algorithm for Finding Challenging Motifs in DNA Sequences", In Proc. of Trim Siz, 2006.
[8]    C. T. Hardin, E. C. Rouchka, “DNA motif detection using particle swarm optimization and expectation-maximization”, In IEEE Proc. of Swarm Intelligence Symposium, SIS, pp. 181- 184, 2005.
[9]    U. Keich and P. Pevzner. Finding motifs in the twilight zone. In Proceedings of the Sixth Annual International Conference on Research in Computational Molecular Biology, RECOMB, pages 195–204, 2002.
[10]  E. Rocke and M. Tompa. An algorithm for finding novel gapped motifs in DNA sequences. In Sorin Istrail, Pavel Pevzner, and Michael Waterman, editors, Proceedings of the 2nd Annual International Conference on Computational Molecular Biology (RECOMB-98), pages 228–233, New York, 1998. ACM Press.
[11]  L. Yang, E. Huang, and V.B. Bajic. Some implementation issues of heuristic methods for motif extraction from dna sequences. Int.J.Comp.Syst.Signals (To Appear), 2004.
[12]  M.-F. Sagot. "Spelling approximate repeated or common motifs using a suffix tree", In C. L. Lucchesi and A. V. Moura, editors, LATIN’98: Theoretical Informatics, Lecture Notes in Computer Science, pp. 111–127. Springer-Verlag, 1998.
[13]  G. Pavesi, G. Mauri, G. Pesole, "An algorithm for finding signals of unknown length in DNA sequences", Bioinformatics;17 Suppl 1:S207-14, 2001.
[14]  E. Eskin, U. Keich, M. S. Gelfand, P. A. Pevzner. “Genome-Wide Analysis of Bacterial Promoter Regions.” In Proc. of the Pacific Symposium on Biocomputing (PSB-2003). Kaua’i, Hawaii: January 3-7, 2003.
[15]  G. Z. Hertz, and G. D. Stormo, “Identifying DNA and protein patterns with statistically significant alignments of multiple sequences”, Bioinformatics. 15(7-8):563-77, 1999.
[16]  T. L. Bailey, and C. Elkan, "The value of prior knowledge in discovering motifs with MEME", In Proceeding of International Conference on Intelligent Systems and Molecular Biology, Vol. 3, pp. 21–29, 1995.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
[17]  X. Wu, J. Cheng, C. Song, and B. Wang. A combined model and a varied gibbs sampling algorithm used for motif discovery. In Yi-Ping Phoebe Chen, editor, 2nd Asia-Pacific Bioinformatics Conference, volume 29 of CRPIT, pp. 99-104, Dunedin, New Zealand, 2004. ACS.
[18]  N. Karaoglu, S. M. Stroh, and B. Manderick, "GAMOT: An Efficient Genetic Algorithm for Finding Challenging Motifs in DNA Sequences", In Proc. of Trim Siz, 2006.
[19]  J. Zhu and M. Zhang. “SCPD: a promoter database of the yeast Saccharomyces cerevisiae”, Bioinformatics 15:563-577, 1999.
[20]  J. Kennedy and R. Eberhart, "Particle swarm optimization", in Proc. IEEE Int. Conf. Neural Networks, vol. 4, 1995, pp. 1942–1947.
[21]  B. C. H. Chang, A. Ratnaweera, and S. K. Halgamuge, "Particle Swarm Optimization for Protein Motif Discovery", In journal of Genetic Programming and Evolvable Machines, Vol. 5, pp. 203–214, 2004.
[22]  C. T. Hardin, E. C. Rouchka, “DNA motif detection using particle swarm optimization and expectation-maximization”, In IEEE Proc. of Swarm Intelligence Symposium, SIS, pp. 181- 184, 2005.
[23]  Y. Shi and R. C. Eberhart, “Parameter selection in particle swarm optimization,” in Proc. Evolutionary Programming VII, vol. 1447, 1998, pp. 591–600.
[24]  H. Hoos, and T. Stützle, “Stochastic Local Search: Foundations and Applications”, Morgan Kaufmann Publishers, San Francisco, CA, USA. 2004.
[25]  R. Akbari, K. Ziarati, “Combination of Particle Swarm Optimization and Stochastic Local Search for Multimodal Function Optimization”, In proceeding of IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application, pp. 388-392, 2008.
[26]  M. O. Rayes, V. Trevisan, “Factorization of Chebyshev Polynomials”, 1998.
[27]  A. Ratnaweera, S. K. Halgamuge, and H. C. Watson, “Self-Organizing Hierarchical Particle Swarm Optimizer With Time-Varying Acceleration Coefficients”, In IEEE Transactions on Evolutionary Computation, Vol. 8, No. 3, pp. 240-255, 2004.
[28]  Chia-Nan Ko a,*, Ying-Pin Chang b, Chia-Ju Wu, “An orthogonal-array-based particle swarm optimizer with nonlinear time-varying evolution”, Applied Mathematics and Computation 191 (2007) 272–279.
[29]  J. Riget, and J. S. Vesterstrom, “A Diversity-guided Particle Swarm Optimizer-The ARPSO EVALife”, Tech. Rep. 2002-02.
[30]  S. T. Hsieh, T. Y. Sun, C. L. Lin, and C. C. Liu, “Effective Learning Rate Adjustment of Blind Source Separation Based on an Improved Particle Swarm Optimizer”, In IEEE Transactions on Evolutionary Computation, Vol. ,No. , pp. , 2007.
[31]  J. H. Seo1, C. H. Im, C. G. Heo, J. K. Kim, H. K. Jung, and C. G. Lee. “Multimodal Function Optimization Based on Particle Swarm Optimization”, In IEEE Transaction on Magnetics, Vol. 42, No. 4, pp. 1095-1098, 2006.
[32]  R. Akbari, and K. Ziarati, “A Rank Based Particle Swarm optimization with Dynamic Adaptation”, Journal of Computational and Applied Mathematics, Elsevier, Vol. 235, No. 8, pp. 2694-2714, 2011.