Phosphorylation is a crucial way to control the activity of proteins in many eukaryotic organisms in vivo. Experimental methods to determine phosphorylation sites in substrates are usually restricted by the in vitro condition of enzymes and very intensive in time and labor. Although some in silico methods and web servers have been introduced for automatic detection of phosphorylation sites, sophisticated methods are still in urgent demand to further improve prediction performances. Protein primary se-quences can help predict phosphorylation sites catalyzed by different protein kinase and most com-putational approaches use a short local peptide to make prediction. However, the useful information may be lost if only the conservative residues that are not close to the phosphorylation site are consid-ered in prediction, which would hamper the prediction results. A novel prediction method named IEPP (Information-Entropy based Phosphorylation Prediction) is presented in this paper for automatic de-tection of potential phosphorylation sites. In prediction, the sites around the phosphorylation sites are selected or excluded by their entropy values. The algorithm was compared with other methods such as GSP and PPSP on the ABL, MAPK and PKA PK families. The superior prediction accuracies were ob-tained in various measurements such as sensitivity (Sn) and specificity (Sp). Furthermore, compared with some online prediction web servers on the new discovered phosphorylation sites, IEPP also yielded the best performance. IEPP is another useful computational resource for identification of PK-specific phosphorylation sites and it also has the advantages of simpleness, efficiency and con-venience.
WANG MingHui, LI ChunHua, CHEN WeiZu & WANG CunXin College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100022, China
磷酸化作用在多种真核细胞中具有重要的功能.由于对蛋白质激酶底物的实验测定方法限制较多,同时费时费力,因此急需发展快速、自动的机器学习方法.利用蛋白质的一级序列信息可以对不同激酶家族作用的磷酸化位点进行预测,同时也是对实验的一种补充和指导.如果仅对磷酸化位点附近的短肽序列进行处理会丢失相当的信息,将对预测结果造成一定影响.提出了一种基于信息熵的磷酸化位点预测方法IEPP(information-entropy based phosphorylation prediction),利用熵信息对磷酸化位点周围的氨基酸位点进行选择和排除,仅选择对磷酸化作用有效的位点参与预测.对3个代表性激酶家族ABL,MAPK和PKA的测试表明,敏感性(Sn)和专一性(Sp)均好于较新的PPSP和GPS算法的结果.而且同一些在线预测网站的实时测试,如Scansite等相比,结果也要好于这些测试方法.这些都证明了本研究提出的方案是一种有效的磷酸化作用位点预测方法,且具有简单、高效、实时性好等优点.