The advent of DNA microarray technology has offered the promise of casting new insights onto deciphering secrets of life by monitoring activities of thousands of genes simulta-neously. Current analyses of microarray data focus on precise classification of biological types, for example, tumor versus normal tissues. A further scientific challenging task is to extract dis-ease-relevant genes from the bewildering amounts of raw data, which is one of the most critical themes in the post-genomic era, but it is generally ignored due to lack of an efficient approach. In this paper, we present a novel ensemble method for gene extraction that can be tailored to fulfill multiple biological tasks including (i) precise classification of biological types; (ii) disease gene mining; and (iii) target-driven gene networking. We also give a numerical application for (i) and (ii) using a public microarrary data set and set aside a separate paper to address (iii).
Identifying disease-relevant genes and functional modules, based on gene expression pro- files and gene functional knowledge, is of high im- portance for studying disease mechanisms and sub- typing disease phenotypes. Using gene categories of biological process and cellular component in Gene Ontology, we propose an approach to selecting func- tional modules enriched with differentially expressed genes, and identifying the feature functional modules of high disease discriminating abilities. Using the differentially expressed genes in each feature module as the feature genes, we reveal the relevance of the modules to the studied diseases. Using three data- sets for prostate cancer, gastric cancer, and leukemia, we have demonstrated that the proposed modular approach is of high power in identifying functionally integrated feature gene subsets that are highly rele- vant to the disease mechanisms. Our analysis has also shown that the critical disease-relevant genes might be better recognized from the gene regulation network, which is constructed using the characterized functional modules, giving important clues to the concerted mechanisms of the modules responding to complex disease states. In addition, the proposed approach to selecting the disease-relevant genes byjointly considering the gene functional knowledge suggests a new way for precisely classifying disease samples with clear biological interpretations, which is critical for the clinical diagnosis and the elucidation of the pathogenic basis of complex diseases.
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.
GAO Lei1, LI Xia1,2, GUO Zheng1,2, ZHU MingZhu1, LI YanHui1 & RAO ShaoQi1,3 1 Department of Bioinformatics, Harbin Medical University, Harbin 150086, China
Reconstruction of genetic networks is one of the key scientific challenges in functional genomics. This paper describes a novel approach for addressing the regulatory dependencies be-tween genes whose activities can be delayed by multiple units of time. The aim of the proposed ap-proach termed TdGRN (time-delayed gene regulatory networking) is to reversely engineer the dy-namic mechanisms of gene regulations, which is realized by identifying the time-delayed gene regu-lations through supervised decision-tree analysis of the newly designed time-delayed gene expres-sion matrix, derived from the original time-series microarray data. A permutation technique is used to determine the statistical classification threshold of a tree, from which a gene regulatory rule(s) is ex-tracted. The proposed TdGRN is a model-free approach that attempts to learn the underlying regula-tory rules without relying on any model assumptions. Compared with model-based approaches, it has several significant advantages: it requires neither any arbitrary threshold for discretization of gene transcriptional values nor the definition of the number of regulators (k). We have applied this novel method to the publicly available data for budding yeast cell cycling. The numerical results demonstrate that most of the identified time-delayed gene regulations have current biological knowledge supports.
JIANG Wei1,2, LI Xia1,2,3,4, GUO Zheng1,2,3, LI Chuanxing1, WANG Lihong1 & RAO Shaoqi1,5 1. Department of Bioinformatics, Harbin Medical University, Harbin 150086, China
Proteins rarely function in isolation inside and outside cells, but operate as part of a highly intercon- nected cellular network called the interaction network. Therefore, the analysis of the properties of drug-target proteins in the biological network is especially helpful for understanding the mechanism of drug action in terms of informatics. At present, no detailed characterization and description of the topological features of drug-target proteins have been available in the human protein-protein interac- tion network. In this work, by mapping the drug-targets in DrugBank onto the interaction network of human proteins, five topological indices of drug-targets were analyzed and compared with those of the whole protein interactome set and the non-drug-target set. The experimental results showed that drug-target proteins have higher connectivity and quicker communication with each other in the PPI network. Based on these features, all proteins in the interaction network were ranked. The results showed that, of the top 100 proteins, 48 are covered by DrugBank; of the remaining 52 proteins, 9 are drug-target proteins covered by the TTD, Matador and other databases, while others have been dem- onstrated to be drug-target proteins in the literature.
ZHU MingZhu1, GAO Lei1, LI Xia1,2 & LIU ZhiCheng1 1 School of Biomedical Engineering, Capital Medical University, Beijing 100069, China