本文提出一种新的基于有序双端链表的比较排序算法,即ODListsort(ordered double-end linked list sort)算法。该算法首先要定义一个可共存的链表最大数量,然后通过生成链表、根据规则插入数据以及合并操作来对数据集进行排序。在ODListsort算法中,数据元素是以链表形式进行动态内存分配的,因此它比一些经典的排序算法性能更优。实验结果表明,对于随机数据集,ODListsort排序与快速排序的速度接近,比归并排序、选择排序、插入排序以及冒泡排序的速度更快;对于有序数据集,ODListsort排序的效率远超快速排序,略高于归并排序。
Microblog is a social platform with huge user community and mass data. We propose a semantic recommendation mechanism based on sentiment analysis for microblog. Firstly, the keywords and sensibility words in this mechanism are extracted by natural language processing including segmentation, lexical analysis and strategy selection. Then, we query the background knowledge base based on linked open data (LOD) with the basic information of users. The experiment result shows that the accuracy of recommendation is within the range of 70% -89% with sentiment analysis and semantic query. Compared with traditional recommendation method, this method can satisfy users' requirement greatly.
Aiming at the fact that traditional cache replacement strategy lacks pertinence to the semantic cache in the process of extensible markup language (XML) algebra query, a replacement strategy based on the semantic cache contribution value is proposed. First, pattern matching rules for XML algebra query and semantic caches are given. Second, the method of calculating the semantic cache contribution value is proposed. In XML documents with four different sizes, the experimental results of time efficiency show that this strategy supports environment of the XML algebra query and it has better time efficiency than both least frequency used (LFU) and least recently used (LRU).
In data streams or web scenarios at highly variable and unpredictable rates, a good join algorithm should be able to "hide" the delays by continuing to output join results. The non-blocking algorithms allow some tuples to be flushed onto disk, with the goal of producing results continuously when data transmission is suspended. But state-of-the-art algorithms have trouble with the constraint of allocated memory. To make better use of memory, a novel non-blocking join algorithm based on hash-merge for improving query response times is proposed. The reduced data structure of in-memory tuples helps to improve memory utility. A replacement selection tree is applied to adjust memory by expanding or shrinking the size of the tree and separates one external join transaction into multi-subtasks. In addition, a cost model to estimate task output rate is proposed to select the in-disk portion that promises to produce the fastest results in the external join stage. Experiments show that the technique, with far less memory, delivers results faster than the three non-blocking join algorithms ( XJoin, HMJ and RPJ ) , with up to almost two-fold improvement in reliable network and one order of magnitude improvement in unreliable network in terms of the number of the reported tuples.