首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A class of multivariate calibration methods called augmented classical least squares (ACLS) has been proposed which combines an explicit linear additive model with the predictive power of inverse models, such as principal component regression (PCR) and partial least squares (PLS). Because of its use of the explicit linear additive model, ACLS provides an interesting framework to incorporate different sources of prior information, such as measured pure component spectra, in the model. In this study, the predictive power of ACLS models incorporating different amounts of prior information has been compared to that of PCR and PLS using two examples, a designed experiment and one with biological samples. In both cases, the ACLS models showed predictive power comparable to PLS under idealized validation conditions. When a different interferent structure was present in the validation samples, the predictive power of the inverse models (PCR and PLS) dramatically decreased, with an increase in root-mean-squared error of prediction by a factor of 3.5 for the first example and a factor of 2 in the second example. The incorporation of prior information in the ACLS framework was found to considerably reduce or even completely remove these dramatic effects, especially when the pure component contributions for the interferents were taken into account.  相似文献   

2.
Cyclic subspace regression (CSR) is a new approach to the complex multivariate calibration problem. The simple algorithm produces solutions for principal component regression (PCR), partial least squares (PLS), least squares (LS), and other related intermediate regressions. This paper describes further analysis of CSR and shows that by using hat matrices, CSR regression vectors are formed from a summation of weighted eigenvectors where weights are determined from the hat matrix, singular values, and sample space eigenvectors. Examination of CSR weights for PCR and PLS further documents differences and similarities and provides information to assist in determining prediction rank for PCR and PLS. By redefining CSR in terms of weighted eigenvectors, it can be shown when PLS and PCR produce essentially the same results where minor differences stem from overfitting by PLS. Additionally, weights derived from the hat matrix show when PCR and PLS generate different results and why. Equations are shown for the sample space that reveal PLS to be a method based on oblique projections while PCR uses orthogonal projections. The optimal intermediate CSR model can be identified as well. A near infrared data set is studied and illustrates principles involved.  相似文献   

3.
A new wavelength interval selection procedure, moving window partial least-squares regression (MWPLSR), is proposed for multicomponent spectral analysis. This procedure builds a series of PLS models in a window that moves over the whole spectral region and then locates useful spectral intervals in terms of the least complexity of PLS models reaching a desired error level. Based on a proposed theory demonstrating the necessity of wavelength selection, it is shown that MWPLSR provides a viable approach to eliminate the extra variability generated by non-composition-related factors such as the perturbations in experimental conditions and physical properties of samples. A salient advantage of MWPLSR is that the calibration model is very stable against the interference from non-composition-related factors. Moreover, the selection of spectral intervals in terms of the least model complexity enables the reduction of the size of a calibration sample set in calibration modeling. Two strategies are suggested for coupling the MWPLSR procedure with PLS for multicomponent spectral analysis: One is the inclusion of all selected intervals to develop a PLS calibration model, and the other is the combination of the PLS models built separately in each interval. The combination of multiple PLS models offers a novel potential tool for improving the performance of individual models. The proposed procedures are evaluated using two open-path Fourier transform infrared data sets and one near-infrared data set, each having different noise characteristics. The results reveal that the proposed procedures are very promising for vibrational spectroscopy-based multicomponent analyses and give much better prediction than the full-spectrum PLS modeling.  相似文献   

4.
A difficulty when applying partial least squares (PLS) in multivariate calibration is that overfitting may occur. This study proposes a novel approach by combining PLS and boosting. The latter is said to be resistant to overfitting. The proposed method, called boosting PLS (BPLS), combines a set of shrunken PLS models, each with only one PLS component. The method is iterative: the models are constructed on the basis of the residuals of the responses that are not explained by previous models. Unlike classical PLS, BPLS does not need to select an adequate number of PLS components to be included in the model. On the other hand, two parameters must be determined: the shrinkage value and the iteration number. Criteria are proposed for these two purposes. BPLS was applied to seven real data sets, and the results demonstrate that it is more resistant than classical PLS to overfitting without loosing accuracy.  相似文献   

5.
Multivariate data analysis was applied to confocal Raman measurements on stents coated with the polymers and drug used in the CYPHER Sirolimus-eluting Coronary Stents. Partial least-squares (PLS) regression was used to establish three independent calibration curves for the coating constituents: sirolimus, poly(n-butyl methacrylate) [PBMA], and poly(ethylene-co-vinyl acetate) [PEVA]. The PLS calibrations were based on average spectra generated from each spatial location profiled. The PLS models were tested on six unknown stent samples to assess accuracy and precision. The wt % difference between PLS predictions and laboratory assay values for sirolimus was less than 1 wt % for the composite of the six unknowns, while the polymer models were estimated to be less than 0.5 wt % difference for the combined samples. The linearity and specificity of the three PLS models were also demonstrated with the three PLS models. In contrast to earlier univariate models, the PLS models achieved mass balance with better accuracy. This analysis was extended to evaluate the spatial distribution of the three constituents. Quantitative bitmap images of drug-eluting stent coatings are presented for the first time to assess the local distribution of components.  相似文献   

6.
Typical process measurements are usually correlated with each other and compounded with various phenomena occurring at different time and frequency domains. To take into account this multivariate and multi-scale nature of process dynamics, a multi-scale PLS (MSPLS) algorithm combining PLS and wavelet analysis is proposed. The MSPLS first decomposes the process measurements into separated multi-scale components using on-line wavelet transform, and then the resultant multi-scale data blocks are modeled in the framework of multi-block PLS algorithm which can describe the global relationships across the entire scale blocks as well as the localized features within each sub-block at detailed resolutions. To demonstrate the feasibility of the MSPLS method, its process monitoring abilities were tested not only for the simulated data sets containing several fault scenarios but also for a real industrial data set, and compared with the monitoring abilities of the standard PLS method on the quantitative basis. The results clearly showed that the MSPLS was superior to the standard PLS for all cases especially in that it could provide additional scale-level information about the fault characteristics as well as more sensitive fault detection ability.  相似文献   

7.
Comparisons of prediction models from the new augmented classical least squares (ACLS) and partial least squares (PLS) multivariate spectral analysis methods were conducted using simulated data containing deviations from the idealized model. The simulated data were based on pure spectral components derived from real near-infrared spectra of multicomponent dilute aqueous solutions. Simulated uncorrelated concentration errors, uncorrelated and correlated spectral noise, and nonlinear spectral responses were included to evaluate the methods on situations representative of experimental data. The statistical significance of differences in prediction ability was evaluated using the Wilcoxon signed rank test. The prediction differences were found to be dependent on the type of noise added, the numbers of calibration samples, and the component being predicted. For analyses applied to simulated spectra with noise-free nonlinear response, PLS was shown to be statistically superior to ACLS for most of the cases. With added uncorrelated spectral noise, both methods performed comparably. Using 50 calibration samples with simulated correlated spectral noise, PLS showed an advantage in 3 out of 9 cases, but the advantage dropped to 1 out of 9 cases with 25 calibration samples. For cases with different noise distributions between calibration and validation, ACLS predictions were statistically better than PLS for two of the four components. Also, when experimentally derived correlated spectral error was added, ACLS gave better predictions that were statistically significant in 15 out of 24 cases simulated. On data sets with nonuniform noise, neither method was statistically better, although ACLS usually had smaller standard errors of prediction (SEPs). The varying results emphasize the need to use realistic simulations when making comparisons between various multivariate calibration methods. Even when the differences between the standard error of predictions were statistically significant, in most cases the differences in SEP were small. This study demonstrated that unlike CLS, ACLS is competitive with PLS in modeling nonlinearities in spectra without knowledge of all the component concentrations. This competitiveness is important when maintaining and transferring models for system drift, spectrometer differences, and unmodeled components, since ACLS models can be rapidly updated during prediction when used in conjunction with the prediction augmented classical least squares (PACLS) method, while PLS requires full recalibration.  相似文献   

8.
This contribution introduces Elastic Component Regression (ECR) as an explorative data analysis method that utilizes a tuning parameter α ∈ [0,1] to supervise the X-matrix decomposition. It is demonstrated theoretically that the elastic component resulting from ECR coincides with principal components of PCA when α = 0 and also coincides with PLS components when α = 1. In this context, PCR and PLS occupy the two ends of ECR and α ∈ (0,1) will lead to an infinite number of transitional models which collectively uncover the model path from PCR to PLS. Therefore, the framework of ECR shows a natural progression from PCR to PLS and may help add some insight into their relationships in theory. The performance of ECR is investigated on a series of simulated datasets together with a real world near infrared dataset. (The source codes implementing ECR in MATLAB are freely available at http://code.google.com/p/ecr/.)  相似文献   

9.
In this study, multi-objective genetic algorithms (GAs) are introduced to partial least squares (PLS) model building. This method aims to improve the performance and robustness of the PLS model by removing samples with systematic errors, including outliers, from the original data. Multi-objective GA optimizes the combination of these samples to be removed. Training and validation sets were used to reduce the undesirable effects of over-fitting on the training set by multi-objective GA. The reduction of the over-fitting leads to accurate and robust PLS models. To clearly visualize the factors of the systematic errors, an index defined with the original PLS model and a specific Pareto-optimal solution is also introduced. This method is applied to three kinds of near-infrared (NIR) spectra to build PLS models. The results demonstrate that multi-objective GA significantly improves the performance of the PLS models. They also show that the sample selection by multi-objective GA enhances the ability of the PLS models to detect samples with systematic errors.  相似文献   

10.
Fifteen pure molecular chemicals were used to transfer near-IR partial least squares (PLS) models of jet fuel properties between two dispersive near-IR instruments by a novel calibration transfer, standardization, method. PLS was applied to establish models for quantitative analysis of jet fuels properties. The modeled jet fuel properties include: API gravity; %aromatics; cetane index; density; distillation temperatures for 10%, 20%, 50% and 90% recovered volume; flashpoint; freeze point, %hydrogen content; %saturates; and viscosity. The transfer of the PLS models requires that spectra of only 15 pure chemicals be acquired on the primary and secondary instruments. The spectra of the chemicals are then segmented into distinct spectral regions which are subsequently used to digitally construct spectra of virtual standards which mimic jet fuel spectra in the training set. The resulting virtual standards for the primary and secondary instruments are then predicted using the PLS models, and the prediction values are regressed to provide a simple but effective slope and bias correction for transfer. SVSSB calibration transfer of 7 jet fuels properties shows better performance than PDS, for example, in the case of cetane index Root Mean Square Error of Prediction (RMSEPc) of SVSSB and PDS corrected secondary instrument relative to primary instrument prediction are 0.19 and 0.27 respectively. SVSSB and PDS show comparable performance of the other 6 jet fuel properties. For example, RMSEPc of SVSSB and PDS corrected secondary of % hydrogen content of secondary instrument relative to the primary instrument prediction are 0.015 and 0.014 respectively. The Segmented Virtual Standards Slope and Bias Method (SVSSB) performs as well as using real jet fuel standards to generate a slope and bias correction, and also as well as conventional Piecewise Direct Standardization (PDS), while eliminating the need to maintain either the complex fuel standards or the primary instrument.  相似文献   

11.
近红外光谱法测定茶多酚中总儿茶素含量   总被引:21,自引:7,他引:21  
以高效液相色谱(HPLC)分析结果为参考值,建立了快速测量茶多酚中总儿茶素含量的近红外光谱定标模型.将48份茶多酚样品组成定标样品集,在1000~2500nm(4000~10000cm-1)的近红外漫反射光谱为定标波长范围内,光谱经一阶导数(Firstderivative)、二阶导数(Secondderivative)、标准归一化(Stan-dardnormalvariate,SNV)和多元散射校正(multiplicativesignalcorrection,MSC)处理后结合偏最小二乘回归(PLS)定标.经内部交叉验证表明,光谱经SNV处理后建模结果最佳.模型的相关系数Corr.Coeff=0.997,校正均方根RMSEC=1.71%.比较了经典最小二乘法(CLS)、偏最小二乘法(PLS)和主成分回归(PCR)等方法建模结果,以偏最小二乘回归建模效果最好.  相似文献   

12.
A Kalman filter was developed to overcome the problems caused by process drifting. Different types of models were used to predict response variables of an activated sludge waste-water treatment plant. These models were constructed using MLR, PCR, and PLS. The MLR-type regression coefficients were calculated for both the PCR and PLS models. After that, the Kalman filter was used to estimate these coefficients, recursively. Both the PCR and PLS `inner relation' coefficient vectors were also estimated in this way and the results were then compared. The effect of the number of variables was also briefly studied. The testing was carried out using sequential process data. The prediction ability was measured by a Q2-value as a function of a lag in the updating of the coefficients.  相似文献   

13.
Prediction of chemical composition of flowing liquids using passive acoustic measurements and multivariate regression (acoustic chemometrics) has been reported as a promising in-line measurement method. However, the passive acoustic measurement results are also affected directly or indirectly by other factors than composition of the liquid, i.e. physical conditions of the flow and equipment/pipe properties. The present study focuses on the effects of flow rate, accelerometer location and temperature on the acoustic spectra and prediction of composition of liquids. The studied liquids were two-component mixtures of sucrose and water, and three-component mixtures of ethanol, sucrose and water. Multivariate models were estimated using both local and global calibration on full spectra, and augmented frequency and amplitude matrices derived from full spectra. Flow rate and accelerometer location had the most pronounced effect on acoustic spectra and prediction results from recalibrated local models. Temperature had a minor effect on the acoustic spectra and prediction results. The prediction error for determination of ethanol, sucrose and water increased with increasing flow rate. Changes in flow rate resulted in considerable spectral variations, causing the resultant local calibration model to perform poorly predicting the new samples taken at other flow conditions. Global models performed well on prediction of liquid composition at all studied flow and temperature levels. The global models, however, needed higher number of PLS factors and led to higher prediction errors compared to local models. Using the augmented frequency and amplitude matrices in PLS/PPLS global regression models led to higher prediction errors compared to full spectra models. However, the augmented frequency and amplitude models were more parsimonious (4–6 PLS factors) compared to the full spectra models (10–12 PLS factors).  相似文献   

14.
This paper proposes a new method for exploratory analysis and the interpretation of latent structures. The approach is named missing-data methods for exploratory data analysis (MEDA). The MEDA approach can be applied in combination with several models, including Principal Components Analysis (PCA), Factor Analysis (FA) and Partial Least Squares (PLS). It can be seen as a substitute of rotation methods with better properties associated: it is more accurate than rotation methods in the detection of relationships between pairs of variables, it is robust to the overestimation of the number of PCs and it does not depend on the normalization of the loadings. MEDA is useful to infer the structure in the data and also to interpret the contribution of each latent variable. The interpretation of PLS models with MEDA, including variables selection, may be specially valuable for the chemometrics community. The use of MEDA with PCA and PLS models is demonstrated with several simulated and real examples.  相似文献   

15.
Spectral pre-processing and variable selection are often used to produce PLS regression models with better prediction abilities.We proposed here to optimize simultaneously the spectral pre-processing and the variable selection for PLS regression. The method is based on parallel genetic algorithm with a unique chromosome coding both for pre-processing and variable selections. A pool of 31 pre-processing functions with various settings is tested. In the same chromosome several pre-processing steps can be combined.Three near infrared spectroscopic datasets have been used to evaluate the methodology. The efficacy of the co-optimization is evaluated by comparing the prediction ability of the PLS models with those after pre-processing optimization only. The effect of the number of successive pre-processing steps has been also tested.Concerning the different datasets used here, one can observe two different behaviors. In a first case the GA co-optimization procedure is found to perform well, leading to important improvement of the prediction ability especially when three consecutive pre-processing techniques are applied. In a second case, only the preprocessing optimization is enough to obtain an optimal model. All these models are optimal and more accurate compared to the classical models (build with the “trial and error” methods).  相似文献   

16.
Several Brazilian commercial gasoline physicochemical parameters, such as relative density, distillation curve (temperatures related to 10%, 50% and 90% of distilled volume, final boiling point and residue), octane numbers (motor and research octane number and anti-knock index), hydrocarbon compositions (olefins, aromatics and saturates) and anhydrous ethanol and benzene content was predicted from chromatographic profiles obtained by flame ionization detection (GC-FID) and using partial least square regression (PLS). GC-FID is a technique intensively used for fuel quality control due to its convenience, speed, accuracy and simplicity and its profiles are much easier to interpret and understand than results produced by other techniques. Another advantage is that it permits association with multivariate methods of analysis, such as PLS. The chromatogram profiles were recorded and used to deploy PLS models for each property. The standard error of prediction (SEP) has been the main parameter considered to select the “best model”. Most of GC-FID-PLS results, when compared to those obtained by the Brazilian Government Petroleum, Natural Gas and Biofuels Agency — ANP Regulation 309 specification methods, were very good. In general, all PLS models developed in these work provide unbiased predictions with lows standard error of prediction and percentage average relative error (below 11.5 and 5.0, respectively).  相似文献   

17.
The purpose of this study was to predict drug content and hardness of intact tablets using artificial neural networks (ANN) and near-infrared spectroscopy (NIRS). Tablets for the drug content study were compressed from mixtures of Avicel® PH-101, 0.5% magnesium stearate, and varying concentrations (0%, 1%, 2%, 5%, 10%, 20%, and 40% w/w) of theophylline. Tablets for the hardness study were compressed from mixtures of Avicel PH-101 and 0.5% magnesium stearate at varying compression forces ranging from 0.4 to 1 ton. An Intact Analyzer™ was used to obtain near infrared spectra from the tablets with varying drug contents, whereas a Rapid Content Analyzer™ (RCA) was used to obtain spectral data from the tablets with varying hardness. Two sets of tablets from each batch (i.e., tablets with varying drug content and hardness) were randomly selected. One set of tablets was used to generate appropriate calibration models, while the other set was used as the unknown (test) set. A total of 10 ANN calibration models (5 each with 10 and 160 inputs at appropriate wavelengths) and five separate 4-factor partial least squares (PLS) calibration models were generated to predict drug contents of the test tablets from the spectral data. For the prediction of tablet hardness, two ANN calibration models (one each with 10 and 160 inputs) and two 4-factor PLS calibration models were generated and used to predict the hardness of test tablets. The PLS calibration models were generated using Vision® software. Prediction of drug contents of test tablets using the ANN calibration models generated with 10 inputs was significantly better than the prediction obtained with the ANN calibration models with 160 inputs. For tablets with low drug concentrations (less than or equal to 2%w/w), prediction of drug content was better with either of the two ANN calibration models than with the PLS calibration models. However, prediction of drug contents of tablets with greater than or equal to 5% w/w drug was better with the PLS calibration models than with the ANN calibration models. Prediction of tablet hardness was better with the ANN calibration models generated with either 10 or 160 inputs than with the PLS calibration models. This work demonstrated that a well-trained ANN model is a powerful alternative technique for analysis of NIRS data. Moreover, the technique could be used in instances when the conventional modeling of data does not work adequately.  相似文献   

18.
The combination of Raman and infrared spectroscopy on the one hand and wavelength selection on the other hand is used to improve the partial least-squares (PLS) prediction of seven selected yarn properties. These properties are important for on-line quality control during production. From 71 yarn samples, the Raman and infrared spectra are measured and reference methods are used to determine the selected properties. Making separate PLS models for all yarn properties using the Raman and infrared spectra, prior to wavelength selection, reveals that Raman spectroscopy outperforms infrared spectroscopy. If wavelength selection is applied, the PLS prediction error decreases and the correlation coefficient increases for all properties. However, a substantial wavelength selection effect is present for the infrared spectra compared to the Raman spectra. For the infrared spectra, wavelength selection results in PLS prediction errors comparable with the prediction performance of the Raman spectra prior to wavelength selection. Concatenating the Raman and infrared spectra does not enhance the PLS prediction performance, not even after wavelength selection. It is concluded that an infrared spectrometer, combined with a wavelength selection procedure, can be used if no (suitable) Raman instrument is available.  相似文献   

19.
The ex vivo removal of urea during hemodialysis treatments is monitored in real time with a noninvasive near-infrared spectrometer. The spectrometer uses a temperature-controlled acousto optical tunable filter (AOFT) in conjunction with a thermoelectrically cooled extended wavelength InGaAs detector to provide spectra with a 20 cm(-1) resolution over the combination region (4000-5000 cm(-1)) of the near-infrared spectrum. Spectra are signal averaged over 15 seconds to provide root mean square noise levels of 24 micro-absorbance units for 100% lines generated over the 4600-4500 cm(-1) spectral range. Combination spectra of the spent dialysate stream are collected in real-time as a portion of this stream passes through a sample holder constructed from a 1.1 mm inner diameter tube of Teflon. Real-time spectra are collected during 17 individual dialysis sessions over a period of 10 days. Reference samples were extracted periodically during each session to generate 87 unique samples with corresponding reference concentrations for urea, glucose, lactate, and creatinine. A series of calibration models are generated for urea by using the partial least squares (PLS) algorithm and each model is optimized in terms of number of factors and spectral range. The best calibration model gives a standard error of prediction (SEP) of 0.30 mM based on a random splitting of spectra generated from all 87 reference samples collected across the 17 dialysis sessions. PLS models were also developed by using spectra collected in early sessions to predict urea concentrations from spectra collected in subsequent sessions. SEP values for these prospective models range from 0.37 mM to 0.52 mM. Although higher than when spectra are pooled from all 17 sessions, these prospective SEP values are acceptable for monitoring the hemodialysis process. Selectivity for urea is demonstrated and the selectivity properties of the PLS calibration models are characterized with a pure component selectivity analysis.  相似文献   

20.
Alternative methods for quality control in the petroleum industry have been obtained using Near-infrared Spectroscopy (NIRS) combined with multivariate techniques such as PLS (Partial Least-Square). The process of development and refinement of PLS models usually follows a nonsystematic and univariate procedure. The Standard Error of Cross Validation (SECV), the Standard Error of Prediction (SEP) and the determination coefficient (r2regr.) are usually the only guides used in pursuit of the best model. In the present work, a novel approach was proposed using a Doehlert experimental design with three input variables (wavenumber range, preprocessing technique and regression/validation technique) varied at 5, 7 and 3 levels, respectively. Besides SECV, SEP and r2regr., some additional response variables, such as the slope, r2 and pvalue from the external validation, as well as the number of PLS factors, were simultaneously assessed to find the optimum conditions for PLS modeling. The optimum setting for each input variable was simultaneously defined through a multivariate approach using a desirability function. With the proposed approach, the main and interaction effects could also be investigated. The methodology was successfully applied to obtain PLS models to monitor the gasoline quality through the process of product loading in trucks. To prevent product contamination or adulteration, fast prediction of key properties was obtained from FT-NIR spectra within the 7300-3900 cm− 1 region with SECV in the range 0.04-0.63% w/w for composition (Aromatics, Saturates, Olefins and Benzene) and 0.0008 for Relative Density 20/4 °C. Each optimized PLS model was obtained with less than 40 modeling runs, demonstrating the efficiency of the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号