AdaBoost算法是一种典型的集成学习框架,通过线性组合若干个弱分类器来构造成强学习器,其分类精度远高于单个弱分类器,具有很好的泛化误差和训练误差。然而AdaBoost算法不能精简输出模型的弱分类器,因而不具备良好的可解释性。本文将遗传算法引入AdaBoost算法模型,提出了一种限制输出模型规模的集成进化分类算法(Ensemble evolve classification algorithm for controlling the size of final model,ECSM)。通过基因操作和评价函数能够在AdaBoost迭代框架下强制保留物种样本的多样性,并留下更好的分类器。实验结果表明,本文提出的算法与经典的AdaBoost算法相比,在基本保持分类精度的前提下,大大减少了分类器数量。
An minimum description length(MDL) criterion is proposed to choose a good partition for a bipartite network. A heuristic algorithm based on combination theory is presented to approach the optimal partition. As the heuristic algorithm automatically searches for the number of partitions, no user intervention is required. Finally, experiments are conducted on various datasets, and the results show that our method generates higher quality results than the state-of-art methods, cross-association and bipartite, recursively induced modules. Experiment results also show the good scalability of the proposed algorithm. The method is applied to traditional Chinese medicine(TCM) formula and Chinese herbal network whose community structure is not well known, and found that it detects significant and it is informative community division.