Voice conversion(VC) based on Gaussian mixture model(GMM) is the most classic and common method which converts the source spectrum to target spectrum. However this method is prone to over-fitting because of its frame-by-frame conversion. The VC with non-negative matrix factorization(NMF) is presented in this paper, which can keep spectrum from over-fitting by adjusting the size of basis vector(dictionary). In order to realize the non-linear mapping better, kernel NMF(KNMF) is adopted to achieve spectrum mapping. In addition, to increase the accuracy of conversion, KNMF combined with GMM(GKNMF) is also introduced into VC. In the end, KNMF, GKNMF, GMM, principal component regression(PCR), PCR combined with GMM(GPCR), partial least square regression(PLSR), NMF correlation-based frequency warping(NMF-CFW) and deep neural network(DNN) methods are compared with each other. The proposed GKNMF gets better performance in both objective evaluation and subjective evaluation.
针对传统视觉词袋(Bag Of Visual Words,BOVW)模型缺少空间信息,且不能充分表达图像所属类别共有特征的问题,提出一种基于最大频繁项集的视觉词袋表示方法。该方法在排除孤立特征点的基础上,引入环形区域划分的思想,嵌入更多的空间信息。通过对不同环的视觉单词进行频繁项挖掘得到新的视觉单词表示,能有效提高同类别图像视觉单词的相似程度,而使不同类别视觉单词的差异更为显著。通过在图像数据集COREL及Caltech-256上进行分类实验,验证了该方法的有效性和可行性。