It is of great importance to identify new cancer genes from the data of large scale genome screenings of gene mutations in cancers. Considering the alternations of some essential functions are indispensable for oncogenesis, we define them as cancer functions and select, as their approximations, a group of detailed functions in GO (Gene Ontology) highly enriched with known cancer genes. To evaluate the efficiency of using cancer functions as features to identify cancer genes, we define, in the screened genes, the known protein kinase cancer genes as gold standard positives and the other kinase genes as gold standard negatives. The results show that cancer associated functions are more efficient in identifying cancer genes than the selection pressure feature. Furthermore, combining cancer functions with the number of non-silent mutations can generate more reliable positive predictions. Finally, with precision 0.42, we suggest a list of 46 kinase genes as candidate cancer genes which are annotated to cancer functions and carry at least 3 non-silent mutations.
LI YanHui1, GUO Zheng1,2, PENG ChunFang2, LIU Qing2, MA WenCai2, WANG Jing2, YAO Chen2, ZHANG Min2 & ZHU Jing1 1 Bioinformatics Centre, School of Life Science, University of Electronic Science and Technology of China, Chengdu 610054, China