首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We have developed a new method for the prediction of the protein secondary structure from the amino acid sequence. The method is based on the most recent version (IV) of the standard GOR (J Mol Biol 120 (1978) 97) algorithm. A significant improvement is obtained by combining multiple sequence alignments with the GOR method. Additional improvement in the predictions is obtained by a simple correction of the results when helices or sheets are too short, or if helices and sheets are direct neighbors along the sequence (we require at least one residue of coil state between them). The imposition of the requirement that the prediction must be strong enough, i.e. that the difference between the probability of the predicted (most probable) state and the probability of the second most probable state must be larger than a certain minimum value also improves significantly secondary structure predictions. We have tested our method on 12 different proteins from the Protein Data Bank with known secondary structures. The average quality of the GOR prediction of the secondary structure for these 12 proteins without multiple sequence alignment was 63.4%. The multiple sequence alignments improve the average prediction to 71.9%. The correction for short helices and sheets and coil states separating sheets and helices improve further the average prediction to 74.4%. Setting the 10% minimum difference between the most probable and the second probable conformation leads to 77.0% accuracy of the prediction, while increasing this limit to 20% increases the average accuracy of the secondary structure prediction to 81.2%.  相似文献   

2.
The use of multiple sequence alignments for secondary structurepredictions is analysed. Seven different protein families, containingonly sequences of known structure, were considered to providea range of alignment and prediction conditions. Using alignmentsobtained by spatial superposition of main chain atoms in knowntertiary protein structures allowed a mean of 8% in secondarystructure prediction accuracy, when compared to those obtainedfrom the individual sequences. Substitution of these alignmentsby those determined directly from an automated sequence alignmentalgorithm showed variations in the prediction accuracy whichcorrelated with the quality of the multiple alignments and distanceof the primary sequence. Secondary structure predictions canbe reliably improved using alignments from an automatic alignmentprocedure with a mean increase of 6.87percnt;, giving an overallprediction accuracy of 68.5%, if there is a minimum of 25% sequenceidentity between all sequences in a family.  相似文献   

3.
Computational sequence design methods are used to engineer proteins with desired properties such as increased thermal stability and novel function. In addition, these algorithms can be used to identify an envelope of sequences that may be compatible with a particular protein fold topology. In this regard, we hypothesized that sequence-property prediction, specifically secondary structure, could be significantly enhanced by using a large database of computationally designed sequences. We performed a large-scale test of this hypothesis with 6511 diverse protein domains and 50 designed sequences per domain. After analysis of the inherent accuracy of the designed sequences database, we realized that it was necessary to put constraints on what fraction of the native sequence should be allowed to change. With mutational constraints, accuracy was improved vs. no constraints, but the diversity of designed sequences, and hence effective size of the database, was moderately reduced. Overall, the best three-state prediction accuracy (Q(3)) that we achieved was nearly a percentage point improved over using a natural sequence database alone, well below the theoretical possibility for improvement of 8-10 percentage points. Furthermore, our nascent method was used to augment the state-of-the-art PSIPRED program by a percentage point.  相似文献   

4.
5.
The hydration properties of a protein are important determinants of its structure and function. Here, modular neural networks are employed to predict ordered hydration sites using protein sequence information. First, secondary structure and solvent accessibility are predicted from sequence with two separate neural networks. These predictions are used as input together with protein sequences for networks predicting hydration of residues, backbone atoms and sidechains. These networks are trained with protein crystal structures. The prediction of hydration is improved by adding information on secondary structure and solvent accessibility and, using actual values of these properties, residue hydration can be predicted to 77% accuracy with a Matthews coefficient of 0.43. However, predicted property data with an accuracy of 60-70% result in less than half the improvement in predictive performance observed using the actual values. The inclusion of property information allows a smaller sequence window to be used in the networks to predict hydration. It has a greater impact on the accuracy of hydration site prediction for backbone atoms than for sidechains and for non-polar than polar residues. The networks provide insight into the mutual interdependencies between the location of ordered water sites and the structural and chemical characteristics of the protein residues.   相似文献   

6.
Most algorithms for protein secondary structure prediction arebased on machine learning techniques, e.g. neural networks.Good architectures and learning methods have improved the performancecontinuously. The introduction of profile methods, e.g. PSI-BLAST,has been a major breakthrough in increasing the prediction accuracyto close to 80%. In this paper, a brute-force algorithm is proposedand the reliability of each prediction is estimated by a z-scorebased on local sequence clustering. This algorithm is intendedto perform well for those secondary structures in a proteinwhose formation is mainly dominated by the neighboring sequencesand short-range interactions. A reliability z-score has beendefined to estimate the goodness of a putative cluster foundfor a query sequence in a database. The database for predictionwas constructed by experimentally determined, non-redundantprotein structures with <25% sequence homology, a list maintainedby PDBSELECT. Our test results have shown that this new algorithm,belonging to what is known as nearest neighbor methods, performedvery well within the expectation of previous methods and thatthe reliability z-score as defined was correlated with the reliabilityof prediction. This led to the possibility of making very accuratepredictions for a few selected residues in a protein with anaccuracy measure of Q3 > 80%. The further development ofthis algorithm, and a nucleation mechanism for protein foldingare suggested. Received March 27, 2003; revised June 30, 2003; accepted August 22, 2003.  相似文献   

7.
8.
Secondary structure prediction for modelling by homology   总被引:1,自引:0,他引:1  
An improved method of secondary structure prediction has beendeveloped to aid the modelling of proteins by homology. Selecteddata from four published algorithms are scaled and combinedas a weighted mean to produce consensus algorithms. Each consensusalgorithm is used to predict the secondary structure of a proteinhomologous to the target protein and of known structure. Bycomparison of the predictions to the known structure, accuracyvalues are calculated and a consensus algorithm chosen as theoptimum combination of the composite data for prediction ofthe homologous protein. This customized algorithm is then usedto predict the secondary structure of the unknown protein. Inthis manner the secondary structure prediction is initiallytuned to the required protein family before prediction of thetarget protein. The method improves statistical secondary structureprediction and can be incorporated into more comprehensive systemssuch as those involving consensus prediction from multiple sequencealignments. Thirty one proteins from five families were usedto compare the new method to that of Garnier, Osguthorpe andRobson (GOR) and sequence alignment. The improvement over GORis naturally dependent on the similarity of the homologous protein,varying from a mean of 3% to 7% with increasing alignment significancescore.  相似文献   

9.
Secondary structure prediction from amino acid sequence is akey component of protein structure prediction, with currentaccuracy at ~75%. We analysed two state-of-the-art secondarystructure prediction methods, PHD and JPRED, comparing predictionswith secondary structure assigned by the algorithms DSSP andSTRIDE. The specific focus of our study was  相似文献   

10.
We present a method whose purpose is to post-process the fuzzy results of secondary structure prediction methods that use multiple sequence alignments, in order to obtain 'realistic' secondary structures, i.e., secondary structure elements whose length is greater than or equal to some predefined minimum length. This regularization helps with interpretation of the secondary structure prediction.   相似文献   

11.
Evaluation and improvements in the automatic alignment of protein sequences   总被引:1,自引:0,他引:1  
The accuracy of protein sequence alignment obtained by applyinga commonly used global sequence comparison algorithm is assessed.Alignments based on the superposition of the three-dimensionalstructures are used as a standard for testing the automatic,sequence-based methods. Alignments obtained from the globalcomparison of five pairs of homologous protein sequences studiedgave 54% agreement overall for residues in secondary structures.The inclusion of information about the secondary structure ofone of the proteins in order to limit the number of gaps insertedin regions of secondary structure, improved this figure to 68%.A similarity score of greater than six standard deviation unitssuggests that an alignment which is greater than 75% correctwithin secondary structural regions can be obtained automaticallyfor the pair of sequences.  相似文献   

12.
A new method for predicting protein secondary structure from amino acid sequence has been developed. The method is based on multiple sequence alignment of the query sequence with all other sequences with known structure from the protein data bank (PDB) by using BLAST. The fragments of the alignments belonging to proteins from the PBD are then used for further analysis. We have studied various schemes of assigning weights for matching segments and calculated normalized scores to predict one of the three secondary structures: α-helix, β-sheet, or coil. We applied several artificial intelligence techniques: decision trees (DT), neural networks (NN) and support vector machines (SVM) to improve the accuracy of predictions and found that SVM gave the best performance. Preliminary data show that combining the fragment mining approach with GOR V (Kloczkowski et al, Proteins 49 (2002) 154-166) for regions of low sequence similarity improves the prediction accuracy.  相似文献   

13.
The amino acid residues on a protein surface play a key rolein interaction with other molecules, determine many physicalproperties, and constrain the structure of the folded protein.A database of monomeric protein crystal structures was usedto teach computer-simulated neural networks rules for predictingsurface exposure from local sequence. These trained networksare able to correctly predict surface exposure for 72% of residuesin a testing set using a binary model (buried/exposed) and for54% of residues using a ternary model (buried/intermediate/exposed).In the ternary model, only 11% of the exposed residues are predictedas buried and only 5% of the buried residues are predicted asexposed. Also, since the networks are able to predict exposurewith a quantitative confidence estimate, it is possible to assignexposure for over half of the residues in a binary model with>80% accuracy. Even more accurate predictions are obtainedby making a consensus prediction of exposure for a homologousfamily. The effect of the local environment of an amino acidon its accessibility, though smaller than expected, is significantand accounts for the higher success rate of prediction thanobtained with previously used criteria. In the absence of athree-dimensional structure, the ability to predict surfaceaccessibility of amino acids directly from the sequence is avaluable tool in choosing sites of chemical modification orspecific mutations and in studies of molecular interaction.  相似文献   

14.
The inverse folding approach is a powerful tool in protein structure prediction when the native state of a sequence adopts one of the known protein folds. This is because some proteins show strong sequence- structure specificity in inverse folding experiments that allow gaps and insertions in the sequence-structure alignment. In those cases when structures similar to their native folds are included in the structure database, the z-scores (which measure the sequence-structure specificity) of these folds are well separated from those of other alternative structures. In this paper, we seek to understand the origin of this sequence-structure specificity and to identify how the specificity arises on passing from a short peptide chain to the entire protein sequence. To accomplish this objective, a simplified version of inverse folding, gapless inverse folding, is performed using sequence fragments of different sizes from 53 proteins. The results indicate that usually a significant portion of the entire protein sequence is necessary to show sequence-structure specificity, but there are regions in the sequence that begin to show this specificity at relatively short fragment size (15-20 residues). An island picture, in which the regions in the sequence that recognize their own native structure grow from some seed fragments, is observed as the fragment size increases. Usually, more similar structures to the native states are found in the top-scoring structural fragments in these high-specificity regions.   相似文献   

15.
Local protein sequence similarity does not imply a structural relationship   总被引:1,自引:0,他引:1  
A database search often will find a seemingly strong sequencesimilarity between two fragments of proteins that are not expectedto have an evolutionary or functional relationship. It is temptingto suggest that the two fragments will adopt a similar conformationdue to a common pattern of residues that dictate a particularsubstructure. To investigate the likelihood of such a structuralsimilarity, local sequence similarities between proteins ofknown conformation were identified by a standard database searchalgorithm. Significant sequence similarity was identified aswhen the chance probability of obtaining the relatedness scorefrom a scan of the entire database was <1%. In this regionboth true homologies and false homologies are detected. A totalof 69 false homologies was located of length between 20 and262 aligned positions. Many of these alignments had 25% sequenceidentity and a further 25% of conservative changes. However,the results show in general these aligned fragments did nothave a significant similarity in secondary or tertiary structure.Thus local sequence does not indicate a structural similaritywhen there is neither an evolutionary nor functional explanationto support this. Accordingly structure predictions based onfinding a local sequence similarity with an evolutionary unrelatedprotein of known conformation are unlikely to be valid.  相似文献   

16.
目的本文通过建立华支睾吸虫成虫cDNA文库,筛选其功能基因。方法应用SMART方法构建华支睾吸虫成虫cDNA文库,进行大量EST测序,然后应用生物信息学方法将EST序列与GenBank中登陆的序列进行同源性比对、序列拼接及基因完整性判断;并应用NCBI上的RPSBLAST对所筛选基因的保守域进行搜索比对,利用PredictProtein分析、预测其功能域及二级结构。结果从华支睾吸虫成虫cDNA文库中筛选的钙调神经磷酸酶B亚基样蛋白(CsCBLP)基因,经同源性分析,CsCBLP与褐家鼠钙调神经磷酸酶(CaN)同源性为53%,与尾刺耐格里原虫CaNB同源性为40%,与新小杆线虫属结合蛋白的同源性为51%;保守域的比对及功能域、二级结构的预测显示所筛选的CsCBLP是钙调神经磷酸酶B亚基的的类似物,属于钙结合蛋白家族。结论应用生物信息学方法从华支睾吸虫成虫cDNA文库中筛选出钙调神经磷酸酶B亚基样蛋白基因。  相似文献   

17.
目的分析赤子爱胜蚓蛋白磷酸酶2A(protein phosphatase 2A,PP2A)类似激活剂样蛋白的cDNA和氨基酸序列。方法从赤子爱胜蚓cDNA文库中随机测序得到目标cDNA序列,应用DNAMAN、NCBI ORF finder、BLAST、Conserved Domains、GOR、SWISS-MODEL、PDB等基因和蛋白质分析软件进行该目标基因测序及氨基酸序列分析。结果赤子爱胜蚓PP2A类似激活剂样蛋白cDNA序列长547 bp,编码87个氨基酸残基;与该蛋白基因序列相似度较高的多是不同物种中PP2A类似激活剂蛋白基因序列,其中与Crassostrea virginica serine/threonine-protein phosphatase 2A activator-like mRNA核苷酸相似度为67%;与该蛋白氨基酸序列相似度较高的多是不同物种中的PP2A类似激活剂蛋白氨基酸序列,其中与serine/threonine-protein phosphatase 2A activator(Crassostrea gigas)氨基酸相似度为70%;目标序列中存在一段PTPA超家族保守结构域,蛋白质二级结构以α螺旋以及无规则卷曲为主,三级预测结果与二级结构一致。结论本实验分析的赤子爱胜蚓PP2A类似激活剂样蛋白与牡蛎PP2A类似激活剂蛋白及存在于所有真核生物中且高度保守的PTPA同源度较高,可能具有特异性激活肿瘤抑制因子PP2A的间接抑制肿瘤作用。  相似文献   

18.
We have analyzed the performance of majority voting on minimalcombination sets of three state-of-the-art secondary structureprediction methods in order to obtain a consensus prediction.Using three large benchmark sets from the EVA server, our resultsshow a significant improvement in the average Q3 predictionaccuracy of up to 1.5 percentage points by consensus formation.The application of an additional trivial filtering procedurefor predicted secondary structure elements that are too short,does not significantly affect the prediction accuracy. Our analysisalso provides valuable insight into the similarity of the resultsof the prediction methods that we combine as well as the higherconfidence in consistently predicted secondary structure. Received March 7, 2003; revised May 24, 2003; accepted June 6, 2003.  相似文献   

19.
The solvent accessibility of each residue is predicted on thebasis of the protein sequence. A set of 338 monomeric, non-homologousand high-resolution protein crystal structures is used as alearning set and a jackknife procedure is applied to each entry.The prediction is based on the comparison of the observed andthe average values of the solvent-accessible area. It appearsthat the prediction accuracy is significantly improved by consideringthe residue types preceding and/or following the residue whoseaccessibility must be predicted. In contrast, the separate treatmentof different secondary structural types does not improve thequality of the prediction. It is furthermore shown that theresidue accessibility is much better predicted in small thanin larger proteins. Such a discrepancy must be carefully consideredin any algorithm for predicting residue accessibility.  相似文献   

20.
OrienTM is a computer software that utilizes an initial definitionof transmembrane segments to predict the topology of transmembraneproteins from their sequence. It uses position-specific statisticalinformation for amino acid residues which belong to putativenon-transmembrane segments derived from statistical analysisof non-transmembrane regions of membrane proteins stored inthe SwissProt database. Its accuracy compares well with thatof other popular existing methods. A web-based version of OrienTMis publicly available at the address http://biophysics.biol.uoa.gr/OrienTM.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号