首页 | 官方网站   微博 | 高级检索  
     

紫色球杆菌视紫红质光谱特性的机器学习研究
引用本文:郏丽丽,孙婷婷.紫色球杆菌视紫红质光谱特性的机器学习研究[J].浙江大学学报(理学版),2022,49(3):280-286.
作者姓名:郏丽丽  孙婷婷
作者单位:浙江科技学院 理学院,浙江 杭州 310023
基金项目:浙江省自然科学基金资助项目(LY17A040001)
摘    要:近年来,机器学习等人工智能技术被应用于蛋白质工程,其在蛋白质结构、功能预测、催化活性等研究中具有独特优势。在未知蛋白质结构的情况下,将蛋白质序列和功能特性与机器学习相结合,基于序列-活性关系(innovative sequence-activity relationship,ISAR)算法,将蛋白质氨基酸序列数字化,用快速傅里叶变换(fast four transform,FFT)进行预处理,再进行偏最小二乘回归建模,可在数据集较少情况下拟合得到最佳模型。通过机器学习对紫色球杆菌视紫红质(gloeobacter violaceus rhodopsin,GR)的突变体蛋白质氨基酸序列与光谱最大吸收波长进行建模,获得了最佳模型。用最佳索引LEVM760106建模得到的确定系数R2 为0.944,均方误差E为11.64。用小波变换进行的预处理,其R2 虽也约为0.944,但E大于11.64,不及FFT进行的预处理。方法较好地解决了蛋白质序列与功能特性之间的数学建模问题,在蛋白质工程中可为预测更优的突变体提供支持。

关 键 词:机器学习  数字信号处理  光谱特性  
收稿时间:2021-03-02

A machine learning study on gloeobacter violaceus rhodopsin spectral properties
Lili JIA,Tingting SUN.A machine learning study on gloeobacter violaceus rhodopsin spectral properties[J].Journal of Zhejiang University(Sciences Edition),2022,49(3):280-286.
Authors:Lili JIA  Tingting SUN
Affiliation:School of Sciences College,Zhejiang University of Science and Technology,Hangzhou 310023,China
Abstract:In recent years, artificial intelligence technologies such as machine learning have been applied to protein engineering, and have shown unique advantages in studies on as protein structure, function prediction, and catalytic activity. In the absence of protein structure, combining protein sequence and functional properties with machine learning is a new research direction. In this papers, based on a new sequence-activity relationship (ISAR) method, the mutant library of gloeobacter violaceus rhodopsin (GR) and the maximum absorption wavelength of the spectrum are modeled by machine learning. It can fit the best model even in the case of a small number of data sets. The proposed method digitizes the protein amino acid sequence, preprocesses it through fast Fourier transform (FFT), and then performs partial least squares regression (PLSR) modeling. Finally, the best model of the amino acid sequence of the rhodopsin mutant protein and the maximum absorption wavelength of the spectrum is obtained. Modeling with the best index LEVM760106, the coefficient of determination is that R2 is 0.944, and the minimum mean square error E is 11.64. In contrast, when the wavelet transform was used to preprocess the data, the coefficient of determination is close to 0.944, but the E is greater than 11.64, not as good as the result of FFT preprocessing. It is shown that, this method effectively solves the mathematical model relationship between protein sequence and functional characteristics, and provides support for predicting better mutants in later protein engineering.
Keywords:machine learning  digital signal processing (DSP)  spectral characteristics  
点击此处可从《浙江大学学报(理学版)》浏览原始摘要信息
点击此处可从《浙江大学学报(理学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号