This paper proposed a method to incorporate syntax-based language models in phrase-based statistical machine translation (SMT) systems. The syntax-based language model used in this paper is based on link grammar,which is a high lexical formalism. In order to apply language models based on link grammar in phrase-based models,the concept of linked phrases,an extension of the concept of traditional phrases in phrase-based models was brought out. Experiments were conducted and the results showed that the use of syntax-based language models could improve the performance of the phrase-based models greatly.
Chinese organization name recognition is hard and important in natural language processing. To reduce tagged corpus and use untagged corpus,we presented combing Co-training with support vector machines (SVM) and conditional random fields (CRF) to improve recognition results. Based on principles of uncorrelated and compatible,we constructed different classifiers from different views within SVM or CRF alone and combination of these two models. And we modified a heuristic untagged samples selection algorithm to reduce time complexity. Experimental results show that under the same tagged data,Co-training has 10% F-measure higher than using SVM or CRF alone; under the same F-measure,Co-training saves at most 70% of tagged data to achieve the same performance.
With the development of web 2.0, more and more social community applications appeared. The classical type of this kind of application is blog and facebook. The most important feature of these applications is that it is a self-media and users can post their own ideas in Internet. By using these social community applications, a big social network is formed. To study the feature of social network, it is important to mine the individual information at the beginning. In this paper, we propose a User Role based method to mine the relation between the user and object thing. First, we extract the User Role from the semantic dictionary Wordnet. Then, the feature of User Role is also mined by considering the hypemymy and hyponymy relation. Finally, we can use these features to deduce the User Role. In our experiments, we use a big corpus from TREC 2006 to test the mining performance. The experiment results show that the User Role effectively explores the feature of user.