Deep Web技术使得大量隐藏在接口背后的有用信息更容易被用户查找到.然而,随着数据源的增多,如何从众多的数据源中快速地找到合适的结果这一问题变得越来越重要.通过传统的链接分析方法和相关性评估方法来对数据源进行排序,已经不能满足高精度的要求.提出一种通过抽样方法和数据质量评估来判断数据源的优劣性的算法.本文提出的抽样方法,改进了分层抽样和雪球抽样,使得在较少的样本点时,能够准确的反映整体特征.定义了能基本反映数据源的优劣程度的6个主要质量标准,并给出计算方法;通过质量标准,结合权重向量来量化数据源的质量.实验通过对数据源进行抽样分析,求解数据源得分的期望值,并根据该期望值对数据源进行了整体排序.结果表明,利用抽样对数据源的数据质量进行估计和评分,具有很好的准确性和可操作性.
In this paper,we present a novel approach utilizing attributes correlation for the sampling task on nonuniform hidden databases. We propose the method of calculating the attributes dependency and construct the sampling template according to the attributes dependency. Then,we use the sampling template to gen-erate initial sampling queries and propose a bottom-up algorithm to search the sampling template. We also conduct extensive ex-periments over real deep Web sites and controlled databases to illustrate that our sampling method has good performance both on the quality and efficiency.
TIAN Jianwei, LI Shijun, TANG Xiaoyue School of Computer, Wuhan University, Wuhan 430072, Hubei, China
In this paper, we analyze an immunization strategy in SEIQ (susceptible, eclipse, infected, quarantine) model in small- world networks by associating the immunization probability with the infection probability. First, based on the mean-field theory, we establish the transmission dynamics equation for SEIQ model and find the relevant critical threshold of immunization which is re- lated to the topology of the network, the infection rate of the eclipse and infection, the density of quarantine and so on. Then we explain the influence of the immunization probability to the transmission of infectious disease. Finally, by simulating the propagation of this model on disease and comparing the results with theory results, we find that this kind of immunization strategy is effective in SEIQ model in small-world complex networks.
CHEN Shengshuang, ZHOU Jiahua, WEN Lijuan, HUANG Zhangcan School of Science, Wuhan University of Technology, Wuhan430070, Hubei, China
An efficient way to improve the efficiency of the applications based on formal concept analysis (FCA) is to construct the needed part of concept lattice used by applications. Inspired by this idea, an approach that constructs lower concept semi-lattice called non-frequent concept semi-lattice in this paper is introduced, and the method is based on subposition assembly. Primarily, we illustrate the theoretical framework of subposition assembly for non-frequent concept semi-lattice. Second, an algorithm called Nocose based on this framework is proposed. Experiments show both theoretical correctness and practicability of the algorithm Nocose.