Document Type : Research Article
Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran
Today, there are various sources of information in different fields that users can refer to. Generally, the presence of a question in users' minds leads to reference to these sources of information. Users can search for the answer by entering a few keywords in search engines. They can also ask their questions in more detail in the Community Question Answering networks (CQA) so that experts can give a more comprehensive answer to their questions. To get the proper answer, it is necessary to address all the required details in the question. The questions posted in these networks can be divided into clear and unclear questions. In this study, an attempt has been made to extract unique features from the questions through various machine learning approaches, which can be used to classify questions. To extract these features, the word vector of each question was created, and then, using unsupervised algorithms, the questions with similar word vectors were placed in the same group. Then, repetitive concepts were extracted from each group, and their repetition rate in each question makes its feature vector. Finally, the questions were classified based on the extracted feature vector, using ensemble classification models. The achievement of this study is an efficient classification model along with efficient high-resolution feature extraction for classifying clear and unclear questions in CQA networks. Compared to other baselines and transformer-based models on different datasets, the proposed method makes high accuracy results.