Classifying the Clarity of Questions in CQA Networks: A Topic based Approach

Document Type : Research Article


Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran


Today, there are various sources of information in different fields that users can refer to. Generally, the presence of a question in users’ minds leads to reference to these sources of information. Users can search for the answer by entering a few keywords in search engines. They can also ask their questions in more detail in the Community Question Answering (CQA) networks so that experts can give a more comprehensive answer to their questions. To get the proper answer, it is necessary to address all the required details in the question. The questions posted in these networks can be divided into clear and unclear. In this study, an attempt has been made to extract unique features from the questions through various machine learning approaches, which can be used to classify questions. To extract these features, the word vector of each question was created, and then using unsupervised algorithms, the questions with similar word vectors were placed in the same group. Afterwards, repetitive concepts were extracted from each group, and their repetition rate in each question makes its feature vector. Finally, the questions were classified based on the extracted feature vector, using ensemble classification models. The achievement of this study is an efficient classification model along with efficient high-resolution feature extraction for classifying clear and unclear questions in CQA networks. Compared to other baselines and transformer[1]based models on different datasets, the proposed method makes high accuracy results.


Main Subjects

  1. Li, T. Jin, M.R. Lyu, I. King, B. Mak, Analyzing and predicting question quality in Community Question Answering services, in: Proceedings of the 21st International Conference on World Wide Web, Association for Computing Machinery, Lyon, France, 2012, pp. 775–782.
  2. Ravi, B. Pang, V. Rastogi, R. Kumar, Great Question! Question Quality in Community Q&A, in: ICWSM, 2014.
  3. Asaduzzaman, A.S. Mashiyat, C.K. Roy, K.A. Schneider, Answering questions about unanswered questions of Stack Overflow, in: 2013 10th Working Conference on Mining Software Repositories (MSR), 2013, pp. 97-100.
  4. A. Adamic, J. Zhang, E. Bakshy, M.S. Ackerman, Knowledge sharing and yahoo answers: everyone knows something, in: Proceedings of the 17th international conference on World Wide Web, Association for Computing Machinery, Beijing, China, 2008, pp. 665– 674.
  5. R. Tausczik, J.W. Pennebaker, Predicting the perceived quality of online mathematics contributions from users’ reputations, in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, Vancouver, BC, Canada, 2011, pp. 1885–1888.
  6. Dehghan, M. Biabani, A.A. Abin, Temporal expert profiling: With an application to T-shaped expert finding, Information Processing & Management, 56(3) (2019) 1067-1079.
  7. M. Nasehi, J. Sillito, F. Maurer, C. Burns, What makes a good code example?: A study of programming Q&A in StackOverflow, in: 2012 28th IEEE International Conference on Software Maintenance (ICSM), 2012, pp. 25-34.
  8. Bouziane, D. Bouchiha, N. Doumi, M. Malki, Question Answering Systems: Survey and Trends, Procedia Computer Science, 73 (2015) 366-375.
  9. Harper, D. Raban, S. Rafaeli, J.A. Konstan, Predictors of answer quality in online Q&A sites, in: 26th Annual CHI Conference on Human Factors in Computing Systems, CHI 2008, 2008, pp. 865-874.
  10. P. Kato, R.W. White, J. Teevan, S.T. Dumais, Clarifications and question specificity in synchronous social Q&A, in: CHI ‘13 Extended Abstracts on Human Factors in Computing Systems, Association for Computing Machinery, Paris, France, 2013, pp. 913–918.
  11. Trienes, K. Balog, Identifying Unclear Questions in Community Question Answering Websites, in: ECIR, 2019.
  12. Kalchbrenner, E. Grefenstette, P. Blunsom, A Convolutional Neural Network for Modelling Sentences, in, Association for Computational Linguistics, Baltimore, Maryland, 2014, pp. 655-665.
  13. Kim, Convolutional Neural Networks for Sentence Classification, in, Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1746-1751.
  14. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in, Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171-4186.
  15. Khabbazan, A.A. Abin, A Topic Based Method to Classify the Question Clarity in CQA Networks, in: 2021 12th International Conference on Information and Knowledge Technology (IKT), 2021, pp. 96-101.
  16. Braslavski, D. Savenkov, E. Agichtein, A. Dubatovka, What Do You Mean Exactly? Analyzing Clarification Questions in CQA, in: Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, Association for Computing Machinery, Oslo, Norway, 2017, pp. 345–348.
  17. Rao, H. Daumé, III, Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information, in, 2018, pp. 2737-2746.
  18. Freund, R.E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, 55(1) (1997) 119-139.
  19. J. Hastie, S. Rosset, J. Zhu, H. Zou, Multi-class AdaBoostB Statistics and Its Interface, 2 (2009) 349-360.
  20. Caliński, J. Harabasz, A dendrite method for cluster analysis, Communications in Statistics, 3(1) (1974) 1-27.
  21.  P.J. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20 (1987) 53-65.