Homology detection plays a key role in bioinformatics, whereas substitution matrix is one of the most important components in homology detec- tion. Thus, besides the improvement of alignment algorithms, another effective way to enhance the accuracy of homology detection is to use proper substitution matrices or even construct new matrices. A study on the features of various matrices and on the comparison of the performances between differ- ent matrices in homology detection enable us to choose the most proper or optimal matrix for some specific applications. In this paper, by taking BLOSUM matrices as an example, some detailed features of matrices in homology detection are stud- ied by calculating the distributions of numbers of recognized proteins over different sequence identities and sequence lengths. Our results clearly showed that different matrices have different preferences and abilities to the recognition of remote homologous proteins. Furthermore, detailed features of the vari- ous matrices can be used to improve the accuracy of homology detection.
The folding dynamics and structural characteristics of peptides RTKAWNRQLYPEW (P1) and RTKQLYPEW (P2) are investigated by using all-atomic simulation procedure CHARMM in this work. The results show that P1, a segment of an antigen, has a folding motif of α-helix, whereas P2, which is derived by deleting four residues AWNR from peptide P1, prevents the formation of helix and presents a β-strand. And peptlde P1 experiences a more rugged energy landscape than peptide P2. From our results, it is inferred that the antibody CD8 cytolytic T lymphocyte prefers an antigen with a β-folding structure to that with an α-helical one.
In this work, we make an investigation on the preferences of orientations between amino acids using the orientation defined based on the local geometry of the amino acids concerned. It is found that there are common preferences of orientations (70°, 30°, 140°) and (110°, 340°, 100°) for various pairs of amino acids. Different side chains may strengthen or weaken the common preferences, which is related to the effect of packing. Some amino acids having specific local flexibility may possess some preferences of orientations besides the common ones, such as (10°, 280°, 210°). Another analysis on the pairs of the amino acids with different secondary-structure preferences shows that the directional interaction may affect the distribution of orientation more effectively than the packing or local flexibility. All these results provide us some insight of the organization of amino acids in protein, and their relation with some related interactions.
The orientation between the backbone residues of proteins is defined based on the local configurations and the corresponding preferences are analyzed by statistics.It is found that all the residue pairs have some specific preferences of orientations.The statistical analysis is mainly concen-trated in the orientational distributions for two kinds of groupings of residues based on the hydrophobicity and secondary structural features.The statistics for such two types of groupings shows different orienta-tional preferences.It is found that for the former grouping the orientational preference is rather weak, while for the later a kind of strong orientational pref-erences.This suggests that the formation of local structures and of secondary structures are highly related to the orientational preferences.
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned se- quences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitu- tion matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9.
LI Jing1 & WANG Wei1,2 1 National Laboratory of Solid State Microstructure and Department of Physics, Nanjing University, Nanjing 210093, China
In this work, the traditional method of potential of mean force (PMF) is improved for describing the protein-protein interactions. This method is developed at atomic level and is distance-dependent. Compared with the traditional method, our model can reasonably consider the effects of the environ- mental factors. With this modification, we can obtain more reasonable and accurate pair potentials, which are the pre-requisite for precisely describing the protein-protein interactions and can help us to recognize the interaction rules of residues in protein systems. Our method can also be applied to other fields of protein science, e.g., protein fold recognition, structure prediction and prediction of thermo- stability.