Improving Semi-Supervised Clustering Algorithms with Active Query Selection

Authors

  • Walid Atwa Computer Science Department, Faculty of Computers and Information, Menoufia University, 32511, Egypt
  • Mahmoud Emam Mathematics and Computer Science Department, Faculty of Science, Menoufia University, 32511, Egypt

DOI:

https://doi.org/10.25728/assa.2019.19.4.659

Abstract

Semi-supervised clustering algorithms use a small amount of supervised data in the form of pairwise constraints to improve the clustering performance. However, most current algorithms are passive in the sense that the pairwise constraints are provided beforehand and selected randomly. This may lead to the use of constraints that are redundant, unnecessary, or even harmful to the clustering results. In this paper, we address the problem of constraint selection to improve the performance of semi-supervised clustering algorithms. Based on the concepts of Maximum Mean Discrepancy, we select a batch of most informative instances that minimize the difference in distribution between the labeled and unlabeled data. Then, querying these instances with the existing neighborhoods to determine which neighborhood they belong. The experimental results with state-of-the-art methods on different real-world dataset demonstrate the effectiveness and efficiency of the proposed method.

Downloads

Download data is not yet available.

Downloads

Published

2019-12-20

How to Cite

Atwa, W., & Emam, M. (2019). Improving Semi-Supervised Clustering Algorithms with Active Query Selection. Advances in Systems Science and Applications, 19(4), 25–44. https://doi.org/10.25728/assa.2019.19.4.659