组合偏最小二乘回归方法在近红外光谱定量分析中的应用   总被引:3,自引:1,他引:3  
成忠  诸爱士  陈德钊 《分析化学》2007,35(7):978-982
针对近红外光谱数据局部效应显著,变量个数多,彼此间常存在严重的复共线性,并多与样品组分含量呈非线性关系,构建一种组合非线性偏最小二乘回归(E-S-QPLSR)方法。它采用无重复采样技术(subag-ging),从训练样本中生成若干子样,然后每个子样通过二次多项式偏最小二乘回归(QPLSR),建立其子模型,并实现对训练样本因变量的定量预测,再将它们交由线性PLS算法用于计算各子模型的组合权系数。将该法应用于80个玉米样品的水组分含量与其近红外光谱的定量关系建模,效果良好,显示出很强的学习能力,所建模型的预报性能也优于其它方法。  相似文献   

偏最小二乘近红外光谱法测定瘦肉脂肪酸组成的研究   总被引:2,自引:0,他引:2  
利用偏最小二乘将瘦肉的近红外光谱数据分别与其棕榈酸、棕榈油酸、硬脂酸、油酸、亚油酸含量建立校正模型,并用交互校验和外部检验来考查模型的可靠性.各脂肪酸模型的校正相关系数分别为0.9998、0.9844、0.9963、0.9754、0.9969,均方估计残差(RMSEC)分别为0.0231、0.0485、0.111、0.373、0.311,交互校验均方残差(RMSECV)分别为0.509、0.115、0.225、0.848、0.649.应用所建立的各脂肪酸近红外模型对瘦肉脂肪酸组成进行预测,并对各脂肪酸的预测值与气相色谱法测定值进行配对t-检验,结果表明两者差异均不显著(p>0.05).  相似文献   

张若秋  杜一平 《分析测试学报》2020,39(10):1282-1287
在实际多元校正应用中有很多因素会影响偏最小二乘(PLS)模型的预测效果,作为光谱数据本源的仪器噪声是其中的重要影响因素。以往的研究工作多使用各种滤波器或平滑方法来降低仪器噪声的影响,然而对于仪器噪声如何影响偏最小二乘的建模过程和模型预测能力鲜有报道。该文阐述并论证了仪器噪声怎样通过第一个隐变量的计算被引入模型中,经过对偏最小二乘计算过程的理论推导,论述了噪声的引入对偏最小二乘权重向量、载荷向量计算具有累积效应,并随着后续隐变量的计算不断在模型中传递,从而对偏最小二乘模型产生影响。同时对偏最小二乘模型的预测误差进行理论分解,将其划分为无噪理想模型本身的误差和由噪声传播导致的误差。结果表明,仪器噪声不仅会降低偏最小二乘模型的预测性能,还会影响偏最小二乘模型的最优复杂度选择。  相似文献   

应用近红外光谱和偏最小二乘回归法预测玉米中淀粉含量   总被引:1,自引:0,他引:1  
以普通玉米籽粒为试验材料,应用偏最小二乘回归法建立了基于近红外光谱数据的测定玉米籽粒中淀粉含量的校正模型。校正模型的校正误差(RMSEC)、交叉检验误差(RMSECV)和预测误差(RMSEP)分别#30.31%、0.42%和0.29%,校正数据集和独立的检验数据集的预测值与实际测定值之间的相关系数分别达到0.9255和0.9310,表明所建立的校正模型具有较高的预测精度和较好的推广性,为玉米籽粒中淀粉含量的快速、无损测定提供了新的途径:  相似文献   

许永花  王娜  刘金明 《分析化学》2022,50(10):1587-1596
在生物燃气生产过程中,玉米秸秆中的木质纤维素(纤维素、半纤维素和木质素)成分含量对厌氧发酵性能具有重要影响。针对传统方法测定本质纤维素的耗时、成本高等问题,本研究分析了近红外光谱(NIRS)结合化学计量学进行玉米秸秆中木质纤维素含量快速检测的可行性。为提高NIRS模型的检测精度和效率,将遗传模拟退火算法(GSA)、区间偏最小二乘法(iPLS)和支持向量机(SVM)相结合,构建遗传模拟退火区间支持向量机(GSA-iSVM)进行NIRS特征谱区和SVM参数的同步优化,并与反向区间偏最小二乘法(BiPLS)、遗传模拟退火区间偏最小二乘法(GSA-iPLS)的优选特征谱区的建模性能进行对比,确定基于GSA-iSVM建立的纤维素和木质素校正模型性能最佳,基于GSA-iPLS建立的半纤维素校正模型性能最佳。纤维素、半纤维素和木质素最佳校正模型验证集的预测决定系数(Rp2)分别为0.910、 0.990和0.939,预测均方根误差(RMSEP)分别为0.881%、 0.707%和0.249%,剩余预测偏差(RPD)分别为3.283、 10.235和4.27...  相似文献   

半监督学习方法可以充分利用大量未标注样本来弥补已标注样本的不足,针对应用近红外光谱建立农产品等复杂体系的分析模型中,存在获得大量精确标注样本较困难,而使用少量标注样本或大量未准确标注样品建模结果不理想的问题,基于半监督自训练理念,提出半监督偏最小二乘(Semi supervised-partial least squares,SS-PLS)方法优化模型。本研究以全国不同产地、不同等级的211份原料烟叶近红外光谱及其对应感官评价数据为例,应用SS-PLS方法优化模型,模型性能较原始模型有显著提高,优化后SS-PLS方法模型的决定系数(R2)达90%左右,建模标定值分布标准差与拟合值标准差的比值(Ratio of Performance to Deviation,RPD)达3.0以上,模型内部交叉验证及预测标准差(Standard error of cross validation SECV以及Standard Error of Prediction,SEP)值达1.0以下;并将原始感官评价数据与SS-PLS优化后的数据,按照固定阈值划分为优、中、差三个等级,应用基于主成分及FISHER准则的投影方法(Projection Model based on Principal Component and Fisher Criterion,PPF)分析得到的结果表明,SS-PLS优化后的分类结果也显著好于原始感官评价数据。SS-PLS可解决使用小样品集建模的数据代表性问题,在获得大量精确标注样本较困难情况下,为建立近红外光谱分析模型提供了一种新的化学计量学方法。  相似文献   

以玉米中水分、蛋白质、脂肪和淀粉4种主要成分含量以及烟叶总植物碱的偏最小二乘近红外光谱(PLS-NIRs)模型传递为例,考察了模型中潜变量个数(nLVs)对模型传递误差的影响。研究发现,根据累积贡献率大于99.9%确定的玉米、烟叶样品PLS-NIRs模型的nLVs分别为1和13,nLVs=1时建立的玉米模型对两台从机样品4个成分的预测值和主机预测值的重现性指标均满足国标要求;nLVs=13时建立的烟叶总植物碱模型经分段直接校正(PDS)后,可使4台从机样品的平均相对预测误差(MRE)小于6%。采用留一交叉验证或四折交叉验证确定的玉米、烟叶PLS-NIRs模型的nLVs分别为5~10,16与19,在这些nLVs下建立的玉米PLS-NIRs模型对从机样品的预测误差显著增大,超过许可的误差范围,且模型即使经PDS校正后,从机样品预测值与主机样品预测值的重现性指标大多不满足国标要求;nLVs>13时所建烟叶总植物碱PLS-NIRs模型的转移误差随nLVs增大而增大,且PDS校正后不能保证模型对所有从机样品的MRE小于6%。根据累积贡献率大于99.9%或接近99.9%为准则选取nLVs,可...  相似文献   

遗传算法用于偏最小二乘方法建模中的变量筛选   总被引:19,自引:0,他引:19  
利用全局搜索方法-遗传算法(genetic algorithms,GA)对近红外光谱分析中的波长变量进行筛选,再用偏最小二乘方法(patrial least squares,PLS)建立分析校正模型。对两类样品的近红外光谱分析应用实例表明,这种选取变量进行校正的方法,不仅简化、优化了模型,而且增强了所建模型的预测能力,尤其适用于单纯PLS较以校正关联的体系。  相似文献   

选用烟台大樱桃为研究对象,采用便携式光谱仪对樱桃糖度进行检测,利用极差标准归一化方法和小波滤波,对其可见-近红外光谱数据进行预处理,分别运用主成分回归分析(PCR)法和偏最小二乘回归(PLSR)法建立了樱桃糖度定量分析模型,并对两种模型进行了比较。实验结果表明:在600~1 100nm波段范围内对樱桃糖度进行检测是可行的,并且PLSR模型的性能优于PCR模型。  相似文献   

Hui Chen  Zan Lin  Tong Wu 《Analytical letters》2018,51(17):2695-2707
Textile products must be marked by fabric type and composition on the label and cotton is by far the most important fiber in the industry and often needs fast quantitative analysis. The corresponding standard methods are very time-consuming and labor-intensive. The work focuses on exploring the feasibility of combining near-infrared (NIR) spectroscopy and interval-based partial least squares (iPLS) for determining cotton content in textiles. Three types of partial least square (PLS)-based algorithms were used for experimental measurements. A total of 91 cloth samples with cotton content ranging from 0 to 100% (w/w) were collected and all compositions are commercially available on the market in China. In all cases, the original spectrum axis was split into 20 subintervals. As a result, three final models, i.e., the iPLS model on a single subinterval, the backward interval partial least squares (biPLS) model on the region remaining six subintervals, and the moving window partial least squares (mwPLS) model with a window of 75 variables, achieved better results than the full-spectrum PLS model. Also, no obvious differences in performance were observed for the three models. Thus, either iPLS or mwPLS was preferred considering their simplicity, which suggested that iPLS and mwPLS combined with NIR technique may have potential for the rapid determination of the cotton content of textile products with comparable accuracy to standard procedures. In addition, this approach may have commercial and regulatory advantages that avoid labor-intensive and time-consuming chemical analysis.  相似文献   

蛋白质含量是评价鱼粉质量的重要指标,该文采用近红外(NIR)光谱分析技术结合特征筛选方法建立了鱼粉蛋白质含量的快速定量分析模型,并结合区间偏最小二乘(iPLS)和二进制变异策略的差分进化(DE)算法建立了区间偏最小二乘差分进化(iPLS-DE)的波长筛选优化模式,对鱼粉NIR光谱数据进行特征波长筛选。iPLS-DE通过调试iPLS中等分子区间的数量,优选出9个最优特征波段,再采用二进制变异策略的DE算法在最优特征波段内筛选离散特征波长组合,最后根据模型的评价指标确定iPLS-DE优选模型并与iPLS优选模型进行比较。结果表明,将鱼粉全谱等分为5个子区间时,iPLS-DE筛选出50个离散特征波长建立的优选模型对测试集样品的预测均方根误差和相对分析误差分别为1.033%和4.058,而iPLS优选模型对测试集样品的预测均方根误差和相对分析误差分别为1.131%和3.855。表明iPLS-DE方法能够有效地提高NIR光谱分析模型对鱼粉蛋白质定量检测的预测能力。  相似文献   

应用近红外光谱分析技术结合化学计量学方法, 建立了中药清开灵注射液中间体总氮和栀子苷含量测定的新方法. 首先采用Kernard-Stone法对训练集样本和预测集样品进行分类, 然后应用组合的间隔偏最小二乘法(Synergy interval partial least squares, siPLS)对所得近红外透射光谱进行有效谱段范围的选择以及二者定量校正模型的建立, 并对光谱预处理方法进行了详细的讨论. 所建立的总氮和栀子苷校正模型的预测相关系数(R)分别为0.999和0.708; 交叉验证误差均方根(RMSECV)均为0.023; 预测误差均方根(RMSEP)分别为0.074和0.159; 预测结果表明, 本实验所建方法快速、无损且可靠, 可推广并应用于中药注射液中间体的在线质量控制.  相似文献   

邵学广  陈达  徐恒  刘智超  蔡文生 《中国化学》2009,27(7):1328-1332
偏最小二乘法(PLS)在近红外光谱(NIR)定量分析中占有重要地位,但预测结果往往容易受到样本分组和奇异样本等因素的影响,稳健性不强。多模型PLS (EPLS)方法在模型稳健性上得到提高,然而它无法识别样本中存在的奇异样本。为了同时提高模型的预测准确性和稳健性,本文提出了一种根据取样概率重新取样的多模型PLS方法,称为稳健共识PLS(RE-PLS)方法。该方法通过迭代赋权偏最小二乘法(IRPLS)计算样本回归残差得到每个校正集样本的取样概率,然后根据样本的取样概率来选择训练子集建立多个PLS模型,最后将所有PLS模型的预测结果平均作为最终预测结果。该方法用于两种不同植物样品的近红外光谱建模,并与传统的PLS及EPLS方法进行比较。结果表明该方法可以有效的避免校正集中奇异样本对模型的影响,同时可以提高预测精确度和稳健性。对于含有较多奇异样本的,复杂近红外光谱烟草实际样本,利用简单PLS或者EPLS方法建模预测效果不是很理想,而RE-PLS凭借其独特优势则有望在这种复杂光谱定量分析中得到广泛的应用。  相似文献   

A direct, reagent-free, ultraviolet spectroscopic method for the simultaneous determination of nitrate (NO3), nitrite (NO2), and salinity in seawater is presented. The method is based on measuring the absorption spectra of the raw seawater range of 200–300 nm, combined with partial least squares (PLS) regression for resolving the spectral overlapping of NO3, NO2, and sea salt (or salinity). The interference from chromophoric dissolved organic matter (CDOM) UV absorbance was reduced according to its exponential relationship between 275 and 295 nm. The results of the cross-validation of calibration and the prediction sets were used to select the number of factors (4 for NO3, NO2, and salinity) and to optimize the wavelength range (215–240 nm) with a 1 nm wavelength interval. The linear relationship between the predicted and the actual values of NO3, NO2, salinity, and the recovery of spiked water samples suggest that the proposed PLS model can be a valuable alternative method to the wet chemical methods. Due to its simplicity and fast response, the proposed PLS model can be used as an algorithm for building nitrate and nitrite sensors. The comparison study of PLS and a classic least squares (CLS) model shows both PLS and CLS can give satisfactory results for predicting NO3 and salinity. However, for NO2 in some samples, PLS is superior to CLS, which may be due to the interference from unknown substances not included in the CLS algorithm. The proposed method was applied to the analysis of NO3, NO2, and salinity in the Changjiang (Yangtze River) estuary water samples and the results are comparable with that determined by the colorimetric Griess assay.  相似文献   

《Analytical letters》2012,45(10):1518-1526

This article presents a multivariate method of rapidly determining chlopyrifos residue in white radish, based on near-infrared spectroscopy and partial least squares (PLS) regression. Interval PLS (iPLS) was utilized to select the optimum wave number range. The number of PLS components and the number of intervals were optimized according to root mean square error of prediction (RMSEP) and correlation coefficient (R) in prediction set. The result showed that the iPLS model was more reliable than the full model and that near-infrared spectroscopy with iPLS algorithm could be used successfully to analyze chlorpyrifos residue in white radish.  相似文献   

《Analytical letters》2012,45(12):1910-1921
Multiblock partial least squares (MB-PLS) are applied for determination of corn and tobacco samples by using near-infrared diffuse reflection spectroscopy. In the model, the spectra are separated into several sub-blocks along the wavenumber, and different latent variable number was used for each sub-block. Compared with ordinary PLS, the importance and the contribution of each sub-block can be balanced by super-weights and the usage of different latent variable numbers. Therefore, the prediction obtained by the MB-PLS model is superior to that of the ordinary PLS, especially for the large data sets of tobacco samples with a large number of variables.  相似文献   

In the present work, a fast, relatively cheap, and green analytical strategy to identify and quantify the fraudulent (or voluntary) addition of a drug (alprazolam, the API of Xanax®) to an alcoholic drink of large consumption, namely gin and tonic, was developed using coupling near-infrared spectroscopy (NIR) and chemometrics. The approach used was both qualitative and quantitative as models were built that would allow for highlighting the presence of alprazolam with high accuracy, and to quantify its concentration with, in many cases, an acceptable error. Classification models built using partial least squares discriminant analysis (PLS-DA) allowed for identifying whether a drink was spiked or not with the drug, with a prediction accuracy in the validation phase often higher than 90%. On the other hand, calibration models established through the use of partial least squares (PLS) regression allowed for quantifying the drug added with errors of the order of 2–5 mg/L.  相似文献   

