首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 0 毫秒
1.
编码方式是影响蛋白质二级结构预测准确率的重要因素之一。针对单序列蛋白质二级结构预测问题,提出了一种新的综合编码方法。该编码是根据氨基酸出现在每种二级结构中的倾向因子以及氨基酸的疏水性值进行分类,并以二进制形式来表示每类氨基酸的编码方法。在相同的实验条件下,首先用不同的编码方式对数据集CB513进行编码,然后采用支持向量机的方法进行训练建模预测。实验结果显示提出编码的预测准确率比20位正交编码和5位编码分别高出1.48%和10.68%。可见,该编码比较适合非同源或低同源蛋白质结构预测。  相似文献   

2.
蛋白质二级结构预测方法研究   总被引:2,自引:2,他引:0       下载免费PDF全文
为提高蛋白质二级结构预测精度,提出一种新的网络模型和编码方法。首先利用基因表达式编程(GEP)的全局搜索能力同时进化设计神经网络的结构和连接权;其次,对神经网络输入层编码进行了改进,添加了氨基酸残基所处的疏水环境。用PDBSelect25中的36条蛋白质共6 122个残基进行测试,结果表明提出的网络模型和编码方法能有效提高蛋白质二级结构预测的精度。  相似文献   

3.
蛋白质二级结构预测方法的评价   总被引:5,自引:3,他引:5  
蛋白质结构预测是后基因组时代的一项重要任务,蛋白质二级结构预测是蛋白质结构预测的关键步骤。现在一般认为,如果蛋白质二级结构的预测准确率达到80%的话,就可以基本准确地预测一个蛋白质分子的三维空间结构。目前蛋白质二级结构预测的方法不断涌现,提供二级结构预测的网站也逐渐增多。为给广大研究工作者在选择使用这些预测方法时提供一种参考,文章采用统一的标准对10种比较重要而且有效的方法进行测试,并在此基础上做出评价和分析,这10种方法是:GORI、PROF、GORⅣ、NNPREDICT、PHDsec、SSpro v 2.0、PSIPRED、PREDATOR、SOPMA和APSSP2。比较结果显示:APSSP2、SSpro v 2.0和PSIPRED方法的预测效果较好,可以作为使用时的首选方案,其中尤其以APSSP2方法的预测效果最佳。  相似文献   

4.
In many pattern recognition applications, high-dimensional feature vectors impose a high computational cost as well as the risk of “overfitting”. Feature Selection addresses the dimensionality reduction problem by determining a subset of available features which is most essential for classification. This paper presents a novel feature selection method named filtered and supported sequential forward search (FS_SFS) in the context of support vector machines (SVM). In comparison with conventional wrapper methods that employ the SFS strategy, FS_SFS has two important properties to reduce the time of computation. First, it dynamically maintains a subset of samples for the training of SVM. Because not all the available samples participate in the training process, the computational cost to obtain a single SVM classifier is decreased. Secondly, a new criterion, which takes into consideration both the discriminant ability of individual features and the correlation between them, is proposed to effectively filter out nonessential features. As a result, the total number of training is significantly reduced and the overfitting problem is alleviated. The proposed approach is tested on both synthetic and real data to demonstrate its effectiveness and efficiency.  相似文献   

5.
Prediction of protein secondary structure is considered to be an important step toward elucidating the three-dimensional structure and function of proteins. We have developed a multimodal neural network (MNN) to predict protein secondary structure. The MNN is composed of several subclassifiers for single-state predictions using neural networks and a decision neural network (DNN). Each subclassifier employs a number of subnetworks to predict the single-state of the secondary structure individually and produces the final results by majority decision. The DNN uses a three-layer neural network to produce the final overall prediction from the outputs of the single-state predictions. The MNN gives an overall accuracy of 71.1% with corresponding Matthews correlation coefficients of CH = 0.62 and CE = 0.53. The prediction test is based on a database of 126 nonhomologous protein sequences. This work was presented, in part, at the 8th International Symposium on Artificial Life and Robotics, Oita, Japan, January 24#x2013;26, 2003.  相似文献   

6.
应用ANN/HMM混合模型预测蛋白质二级结构   总被引:1,自引:1,他引:0  
针对3状态隐马尔可夫模型(hidden Markov model,HMM)预测蛋白质二级结构准确率不高的问题,提出15状态HMM,通过改进的算法与BP神经网络相结合进行二级结构预测。研究对象为CB513数据集中筛选出的492条蛋白质序列,将其随机均分7组。应用混合模型进行预测,对准确率进行7交叉验证,Q3准确率达7721%,SOV值为7252%。结果表明,混合模型既能充分考虑相邻氨基酸残基间的相互影响,也能在一定程度上照顾二级结构的远程相关性,因此带来了较好的预测准确率。  相似文献   

7.
神经网络具有容易陷入局部极小的缺点,动态隧道神经网络通过“钻隧道”方式,让目标函数跳出局部最小,找到更小的可行域,从而避免神经网络陷入局部极小。传统的动态隧道技术隧道方向单一并且随意,因此具有不稳定性。为了有效提高动态隧道的搜索效率,提出了一种改进型动态隧道神经网络算法。该算法增加搜索的隧道数,引入夹角弹性系数控制隧道方向,考察隧道之间的相互影响。在对alpha、beta和coil型蛋白质的二级结构预测的实验中,改进型动态隧道神经网络算法预测的效果优于神经网络算法和传统的动态隧道神经网络算法。  相似文献   

8.
王艳春 《计算机应用研究》2009,26(10):3687-3689
为提高蛋白质二级结构预测的精度,提出了一种基于GEP-BP网络集成的两层结构预测模型。首先利用基因表达式编程(GEP)的全局搜索能力同时进化设计BP网络的结构和连接权,并将进化最后一代的个体用BP算法进一步训练学习,然后采用组合方法将部分个体集成构成模型的第一层;根据神经网络输出之间具有相关性,用第二层网络对第一层的预测结果进行精炼。用PDBSelect25中的36条蛋白质共6 122个残基进行测试,结果表明提出的模型能有效预测蛋白质二级结构,将预测精度提高到73.02%。  相似文献   

9.
10.
Accurate protein secondary structure prediction (PSSP) is essential to identify structural classes, protein folds, and its tertiary structure. To identify the secondary structure, experimental methods exhibit higher precision with the trade-off of high cost and time. In this study, we propose an effective prediction model which consists of hybrid features of 42-dimensions with the combination of convolutional neural network (CNN) and bidirectional recurrent neural network (BRNN). The proposed model is accessed on four benchmark datasets such as CB6133, CB513, CASP10, and CAP11 using Q3, Q8, and segment overlap (Sov) metrics. The proposed model reported Q3 accuracy of 85.4%, 85.4%, 83.7%, 81.5%, and Q8 accuracy 75.8%, 73.5%, 72.2%, and 70% on CB6133, CB513, CASP10, and CAP11 datasets respectively. The results of the proposed model are improved by a minimum factor of 2.5% and 2.1% in Q3 and Q8 accuracy respectively, as compared to the popular existing models on CB513 dataset. Further, the quality of the Q3 results is validated by structural class prediction and compared with PSI-PRED. The experiment showed that the quality of the Q3 results of the proposed model is higher than that of PSI-PRED.  相似文献   

11.
A Cascade Correlation Learning Architecture (CCLA) of neural networks is tested on the task of predicting the secondary structure of proteins. The results are compared with those obtained with Neural Networks (NN) trained with the back-propagation algorithm (BPNN) and generated with genetic algorithms. CCLA proceeds towards the global minimum of the error function more efficiently than BPNN. However, only a slight improvement in the average efficiency value is noticeable (61.82% as compared with 61.61% obtained with BPNN). The values of the three correlation coefficients for the discriminated secondary structures are also rather similar (Ct8,C ,C and Ccoil are 0.36, 0.29 and 0.36 with CCLA, and 0.36, 0.31 and 0.35 with BPNN). This indicates that the efficiency of the prediction does not depend upon the training algorithm, and confirms our previous observation that when single sequences are used as input code to the network system, different NN architectures can perform similarly.  相似文献   

12.
Protein structure prediction (PSP) has a large potential for valuable biotechnological applications. However the prediction itself encompasses a difficult optimization problem with thousands of degrees of freedom and is associated with extremely complex energy landscapes. In this work a simplified three-dimensional protein model (hydrophobic-polar model, HP in a cubic lattice) was used in order to allow for the fast development of a robust and efficient genetic algorithm based methodology. The new methodology employs a phenotype based crowding mechanism for the maintenance of useful diversity within the populations, which resulted in increased performance and granted the algorithm multiple solutions capabilities. Tests against several benchmark HP sequences and comparative results showed that the proposed genetic algorithm is superior to other evolutionary algorithms. The proposed algorithm was then successfully adapted to an all-atom protein model and tested on poly-alanines. The native structure, an alpha helix, was found in all test cases as a local or a global minimum, in addition to other conformations with similar energies. The results showed that optimization strategies with multiple solutions capability present two advantages for PSP applications. The first one is a more efficient investigation of complex energy landscapes; the second one is an increase in the probability of finding native structures, even when they are not at the global optimum.  相似文献   

13.
Rahim  Mehul  Tinku  Chaitali   《Pattern recognition》2006,39(12):2494-2505
Predicting the protein structure from an amino acid sequence is computationally very intensive. In order to speed up protein sequence matching and processing, we present a novel coprocessor architecture for fast protein structure prediction. The architecture consists of systolic arrays to speed up the data intensive sequence alignment and structure prediction steps, and finite state machines for the control dominated steps. The architecture has been synthesized using Synopsys DC Compiler in 0.18 micron CMOS technology and details of its area and timing performance have been provided. A procedure to develop architectures with area-time trade-offs has also been presented.  相似文献   

14.
We present a new method for predicting RNA secondary structure based on a genetic algorithm. The algorithm is designed to run on a massively parallel SIMD computer. Statistical analysis shows that the program performs well when compared to a dynamic programming algorithm used to solve the same problem. The program has also pointed out a long-standing simplification in the implementation of the original dynamic programming algorithm that sometimes causes it not to find the optimal secondary structure.  相似文献   

15.
Protein function prediction is an important problem in functional genomics. Typically, protein sequences are represented by feature vectors. A major problem of protein datasets that increase the complexity of classification models is their large number of features. Feature selection (FS) techniques are used to deal with this high dimensional space of features. In this paper, we propose a novel feature selection algorithm that combines genetic algorithms (GA) and ant colony optimization (ACO) for faster and better search capability. The hybrid algorithm makes use of advantages of both ACO and GA methods. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of two prominent population-based algorithms, ACO and genetic algorithms. Experimentation is carried out using two challenging biological datasets, involving the hierarchical functional classification of GPCRs and enzymes. The criteria used for comparison are maximizing predictive accuracy, and finding the smallest subset of features. The results of experiments indicate the superiority of proposed algorithm.  相似文献   

16.
Thyroid hormones are essential for all the metabolic and reproductive activities with significance to growth, and neuron development in the human body. The thyroid hormone dysfunction has many ill consequences, affecting the human population; thereby being a global epidemic. It is noticed that every one in 10 persons suffer from different thyroid disorders in India. In recent years, many researchers have implemented various disease predictive models based on Information and Communications Technology (ICT). Increasing the accuracy of disease classification is a critical and challenging task. To increase the accuracy of classification, in this paper, we propose a hybrid optimization algorithm-based feature selection design for thyroid disease classifier with rough type-2 fuzzy support vector machine. This work uses the hybrid optimization algorithm, which combines the firefly algorithm (FA) and butterfly optimization algorithm (BOA) to select the top-n features. The proposed hybrid firefly butterfly optimization-rough type-2 fuzzy support vector machine (HFBO-RT2FSVM) is evaluated with several key metrics such as specificity, accuracy, and sensitivity. We compare our approach with well-known benchmark methods such as improved grey wolf optimization linear support vector machine (IGWO Linear SVM) and mixed-kernel support vector machine (MKSVM) methods. From the experimental evaluations, we justify that our technique improves the accuracy by large thereby precise in identifying the thyroid disease. HFBO-RT2FSVM model attained an accuracy of 99.28%, having specificity and sensitivity of 98 and 99.2%, respectively.  相似文献   

17.
This paper presents the design, implementation and application of a constraint programming framework on 3D crystal lattices. The framework provides the flexibility to express and resolve constraints dealing with structural relationships of entities placed in a 3D lattice structure in space. Both sequential and parallel implementations of the framework are described, along with experiments that highlight its superior performance with respect to the use of more traditional frameworks (e.g. constraints on finite domains and integer programming) to model lattice constraints. The framework is motivated and applied to address the problem of solving the protein folding prediction problem, i.e. predicting the 3D structure of a protein from its primary amino acid sequence. Results and comparison with performance of other constraint‐based solutions to this problem are presented. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

18.
Protein secondary structure prediction has a fundamental influence on today’s bioinformatics research. In this work, tertiary classifiers for the protein secondary structure prediction are implemented on Denoeux Belief Neural Network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 matrix and PSSM matrix are experimented separately as the encoding schemes for DBNN. Hydrophobicity matrix, BLOSUM62 matrix and PSSM matrix are applied to DBNN architecture for the first time. The experimental results contribute to the design of new encoding schemes. Our accuracy of the tertiary classifier with PSSM encoding scheme reaches 72.01%, which is almost 10% better than the previous results obtained in 2003. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the Hyper-Threading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that Hyper-Threading technology for Intel architecture is efficient for parallel biological algorithms.
Yi Pan (Corresponding author)Email:
  相似文献   

19.
使用从分子一级拓扑结构出发、结合分子中非氢原子电性和键连属性以及原子间相对距离的分子电性距离矢量(MEDV-B),对58个血管紧张素转化酶(ACE)抑制剂二肽和48个苦味(BT)二肽进行定量结构活性相关(QSAR)研究,用多元线性回归建立矢量描述子与活性观测值间相关模型,并用留一法交互校验(LOO-CV)检验其预测力,取得较满意的结果(ACE:n=58,m=10,R=0.894,RCF=0.818;BT:n=48,m=10,R=0.947,R=0.898);再用逐步回归对变量进行筛选与优化,建立新模型,稳定性与预测力得到进一步改善(ACE:n=58,m=5,R=0.859,RCV=0.824;BT:n=48,m=5,R=0.931,RCV=0.908)。结果表明:该矢量描述子可用于二肽结构表征与生物功能预测,且计算简便。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号