近年来,大量半监督分类算法被提出。然而在真实的学习任务中,研究者很难决定究竟选择哪一种半监督分类算法,而在这方面并没有任何指导。半监督分类算法可通过数据分布假设进行分类。为此,在对比分析采用不同假设的半监督分类典型算法的基础上,以最小二乘方法(Least Squares,LS)为基准,研究比较了基于聚类假设的转导支持向量机(Transductive Support Vector Machine,TSVM)和基于流行假设的正则化最小二乘法(Laplacian Regularized Least Squares Classification,Lap RLSC),并同时利用两种假设的Semi Boost以及无任何假设的蕴含限制最小二乘法(Implicitly Constrained Least Squares,ICLS)的分类效果。得出的结论为,在已知数据样本分布的情况下,利用相应假设的方法可保证较高的分类正确率;在对数据分布没有任何先验知识且样本数量有限的情况下,TSVM能够达到较高的分类精度;在较难获得样本标记而又强调分类安全性时,宜选择ICLS,而Lap RLSC也是较好的选项之一。
Indefinite kernels have attracted more and more attentions in machine learning due to its wider application scope than usual positive definite kernels. However, the research about indefinite kernel clustering is relatively scarce. Furthermore, existing clustering methods are mainly designed based on positive definite kernels which are incapable in indefinite kernel scenarios. In this paper, we propose a novel indefinite kernel clustering algorithm termed as indefinite kernel maximum margin clustering (IKMMC) based on the state-of-the-art maximum margin clustering (MMC) model. IKMMC tries to find a proxy positive definite kernel to approximate the original indefinite one and thus embeds a new F-norm regularizer in the objective function to measure the diversity of the two kernels, which can be further optimized by an iterative approach. Concretely, at each iteration, given a set of initial class labels, IKMMC firstly transforms the clustering problem into a classification one solved by indefinite kernel support vector machine (IKSVM) with an extra class balance constraint and then the obtained prediction labels will be used as the new input class labels at next iteration until the error rate of prediction is smaller than a prespecified tolerance. Finally, IKMMC utilizes the prediction labels at the last iteration as the expected indices of clusters.Moreover, we further extend IKMMC from binary clustering problems to more complex multi-class scenarios. Experimental results have shown the superiority of our algorithms.