Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Document Type : Research Article

Authors

1 1-MSc. Student, Control and Intelligent Processing Center of Excellence (CIPCE), Electrical and Computer Engineering Department, University of Tehran, Tehran, Iran

2 Professor, Department of Diagnostic Radiology, Henry Ford Hospital, Detroit, MI, USA 2- Professor, School of Cognitive Sciences (SCS), Institute for Research in Fundamental Sciences (IPM), Tehran, Iran

Abstract

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wrapper feature selection that takes advantage of a modified method of sequential forward floating search (SFFS) algorithm. The filtering approach evaluates the features for predicting the output and complementing the other features. The candidate subset generated by the filtering approach is used by k-fold cross validation of support vector machine (SVM) with user-defined classification margin as a wrapper. Applications of the proposed SFFS method to five biomedical datasets illustrate its superiority in terms of classification accuracy and execution time relative to the conventional SFFS method and another previously improved SFFS method.

Keywords


[1]
J. Fan; J. Lv; “A selective overview of variable selection in high dimensional feature space”, Statistica Sinica, Vol. 20(1), pp. 101-148, 2010.
[2]
M. Pal; G. M. Foody; “Feature Selection for Classification of Hyperspectral Data by SVM”, IEEE Trans. Geoscience and Remote Sensing, Vol. 48 , No. 5, pp. 2297-2307, 2010.
[3]
Y. Liu; “Feature extraction and dimensionality reduction for mass spectrometry data”, Computers in Biology and Medicine, Vol. 39, pp. 818-823, 2009.
[4]
Y. Peng; Z. Wu; J. Jiang; “A novel feature selection approach for biomedical data classification”, J. Biomedical Informatics, Vol. 43, pp. 15-23, 2010.
[5]
F. Gorunesco; “Data Mining, Concepts, Models and Techniques”, Inteligent Systems Referencelibrary, Springer, Vol. 12, 2011.
[6]
Y. Saeys; I. Inza; P. Larranaga; “A review of feature selection techniques in Bioinformatics”, Bioinformatics, Vol. 23, pp. 2507-2517, 2007.
[7]
Z. Aghbari; Bayesian Network, <http://www.intechopen.com/books/bayesian-network/classification-of-categorical-and-numerical-data-on-selected-subset-of-features>, 2010.
[8]
Y. Yang; X. Guan; J. You; “CLOPE: A fast and effective clustering algorithm for transactional data”, Proceeding of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 682–687, 2002.
[9]
H. Qin; X. Ma; T. Herawan; J. Mohamad Zain; “MGR: An information theory based hierarchical divisive clustering algorithm for categorical data”, Knowledge-Based Systems, Vol. 67, pp. 401–411, 2014.
[10]
B. Pandey; R.B. Mishra; “Knowledge and intelligent computing system in medicine”, Computers in Biology and Medicine, Vol. 39, pp. 215-230, 2009.
[11]
T. W. S Chow; P. Wang; E. W. M. Ma; “A new feature selection scheme using a data distribution factor for unsupervised nominal data”, IEEE Transactions on Systems, Man, and Cybernetics, Part B, Vol. 38, pp. 499-509, 2008.
[12]
M. Blazadonakis; M. Zervakis; “Wrapper filtering criteria via linear neuron and kernel approaches”, Computers in Biology and Medicine, Vol. 38, pp. 894-912, 2008.
[13]
R. Blanco; P. Larranaga; I. Inza; B. Sierra; “Gene selection for cancer classification using wrapper approaches”, International Journal of Pattern Recognition and Artificial Intelligence, Vol. 18, pp. 1373-1390, 2004.
[14]
M. Pechenizkiy, A. Tsymbal; S. Puuronen; “Local dimensionality reduction and supervised learning within natural clusters for biomedical data analysis”, IEEE Transactions on Information Technology in Biomedicine, Vol. 10, pp. 533-539, 2006.
[15]
X. Q. Zeng; G. Z. Li; J. Y. Yang; “Dimension reduction with redundant gene elimination for tumor classification”, BMC Bioinformatics, Vol. 9, S-6 2008.
[16]
M. A. Hall; “Correlation-based feature selection for machine learning”, Ph.D. Thesis, Computer science department, University of Waikato at Hamilton, New Zealand, 1999.
[17]
H. Liu; L. Yu; “Toward integrated feature selection algorithms for classification and clustering”, IEEE Transactions on Knowledge and Data Engineering, Vol. 17, pp. 491-502, 2005.
[18]
K. Kira; L. A.Rendell; “The Feature Selection Problem: Traditional methods and a new algorithm”, Proceedings of Ninth National Conference on Artificial Intelligence, pp. 129-134, 1992.
[19]
I. Kononenko; “Estimating attributes: Analysis and Extension of RELIEF”, Proceedings of European Conference on Machine Learning, pp. 171-182, 1994.
[20]
V. Kariwala; L. Ye; Y. Cao; “Branch and bound method for regression-based controlled variable selection”, Computers & Chemical Engineering, Vol. 54, pp. 1–7, 2013.
[21]
R.V. Rao; V.J. Savsani; D.P. Vakharia; “Teaching–Learning-Based Optimization: An optimization method for continuous non-linear large scale problems”, Information Sciences, Vol. 183, Issue 1, pp. 1–15, 2012.
[22]
T. Mansouri; A. Farasat; M. B. Menhaj; M. Moghadam; “ARO: A new model free optimization algorithm for real time applications inspired by the asexual reproduction”, Expert Systems with Applications, Vol. 38, pp. 4866-4874, 2011.
[23]
F. J. Ferri; P. Pudil; M. Hatef; J. Kittler; “Comparative study of techniques for large scale feature selection”, Machine Intelligence and Pattern Recognition, Vol. 16, pp. 403-413, 1994.
[24]
E. Yilmaz; “An expert system based on fisher score and LS_SVM for cardiac Arrhythmia Diagnosis”, Computational and Mathematical Methods in Medicine, Vol. 2013, Article ID 849674, 6 pages, 2013.
[25]
M. Kudo; J. Sklansky; “Comparison of algorithms that select features for pattern Recognition”, Pattern Recognition, Vol. 33, pp. 25-41, 2000.
[26]
E.Namsrai; T. Munkhdalai; “A feature selection-based Ensemble method for arrhythmiaclassification”, Journal of Information Processing Systems, Vol. 9, pp. 31-40, 2013.
[27]
S. N. Ghazavi; T. W. Liao; “Medical data mining by fuzzy modeling with selected features”, Artificial Intelligence in Medicine, Vol. 43, pp. 195-206, 2008.
[28]
Y. H. Peng; “A novel ensemble machine learning for robust microarray data Classification”, Computers in Biology and Medicine, Vol. 36, pp. 553-573, 2006.
[29]
X. Zhang; X. Lu; “Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data”, BMC Bioinformatics, Vol. 7, 2006.
[30]
P. Somol; J. Novovicova J; P. Pudil; “Flexible-hybrid sequential floating search in statistical feature selection”, Lecture notes in computer science, Springer-Verlag, Vol. 41, pp. 632-639, 2006.
[31]
A. Asuncion; D. J. Newman; <http://www.ics.uci.edu/~mlearn/MLRepository.html> UCI machine learning repository, School of Information and Computer Science, University of California at Irvine, 2007.