The statistical analysis of compositional data is of fundamental importance to practitioners in generaland to chemists in particular.The existing methodology is principally due to Aitchison,who effectivelyuses two transformations,a ratio followed by the logarithmic,to create a useful,coherent theory thatin principle allows the plethora of normal-based multivariate techniques to be used on the transformeddata.This paper suggests that the well-known class of Box-Cox transformations can be employed inplace of the logarithmic to significantly improve the existing methodology.This is supported in part byshowing that one of the most basic problems that Aitchison managed to overcome,namely thespecification of an interpretable covariance structure for compositional data,can be resolved,or nearlyresolved,once the ratio transformation has been applied.Hence the resolution is not directly dependenton the logarithmic transformation.It is then verified that access to the general Box-Cox family will allowa more accurate use of the normal-based multivariate techniques,simply because better fits to normalitycan be achieved.Finally,maximum likelihood estimation and some associated asymptotics are employedto construct confidence intervals for ratios of the true,unknown compositional constituents.Heretoforethis had not been done even in the context of the logarithmic transformation.Applications to real dataare presented.  相似文献   


The utility of nonmetric, multidimensional-scaling techniques is demonstrated for the analysis and collection of environmental-cognition data. By comparing the multidimensional-scaling solutions of a real-setting map to scaling solutions for sketch maps and two psychophysical, distance-scaling procedures, we demonstrate that magnitude estimation of actual interpoint distances is comparable in accuracy to sketch maps when produced without constraints, or when subjects are given a specified list of landmarks to include on their maps. Triadic comparisons of actual interpoint distances were less accurate than the three other techniques.  相似文献   

Abstract factor analyses were performed on databases consisting of simulated samples from aqueousequilbria.The program COMPLEX was used to generate equilibrium species in a system of three reactantmetals and five reactant bases.Reactant concentrations and pH were drawn from random-normaldistributions so that sample data vectors comprised a multivariate log-normal distribution of equilibriumconcentrations.In addition,sample groups were created containing different distributions for pH andreactant concentrations.Equilibrium species were shown to contain variance contributed by change in pH among samples aswell as change in reactant concentrations.Factor modelling revealed the qualitative relationships amongthe species and how the relationships change with pH.Factors also revealed those reactants containingvariance in the data matrix.In some cases,reactant variance obscured relationships between pH and theequilibrium species.Since factor modelling of a simulated data matrix revealed the expected chemical equilibriuminteractions,a potentially powerful tool exists for investigating the effects of outliers and error.  相似文献   

Digital filter smoothing methods for shot-noise-limited data are addressed in this study.The preferredmethod is based on a Gaussian filter in which the width of the Gaussian filter function is varied dependingon the estimate of the second derivative of the raw data.This filter is developed from the standpoint ofmaximum likelihood parameter estimation of the probability density function which describes shot-noise-limited data.The smoothing filter is tested and compared with the conventional sequential regressionfilter.This adaptive Gaussian smoothing filter works better than both the sequential regression and theadaptive Gaussian filter derived for normal noise.For data containing both high-and low-frequencycomponents,the limiting step in the adaptive filter is an estimation of the smoothing interval.Methodsfor determining an optimum smoothing interval are discussed.With the optimized smoothing interval,the adaptive Gaussian filter works well for data sets with a wide range of varying frequency components.In particular,synthetic data typical of atomic emission spectra are used to test this smoothing filter.  相似文献   

A general equation to derive kinetic models up to any order is given. This equation greatly facilitates theapplication of the Taylor series method to the integration of kinetic models up to very high orders. Whendealing with non-stiff models, computing time is always reduced by increasing the integration order, atleast up to the 20th order. When the model is stiff, the integration order should be optimized; however,a twelfth order is recommended to integrate weakly stiff models. The use of an algorithm which permitsthe immediate calculation of the integration step size required to maintain a given accuracy leads tofurther reductions in computing time. When implemented as recommended here, a high-order Taylorseries method is more rapid and accurate than Runge-Kutta and predictor-corrector methods and canbe advantageously used in combination with optimization methods to perform mechanism studies andin multicomponent kinetic determinations.  相似文献   

Exploratory data analysis(EDA)is a toolbox of data manipulation methods for looking at data to seewhat they seem to say,i.e.one tries to let the data speak for themselves.In this way there is hope thatthe data will lead to indications about'models'of relationships not expected a priori.In this respect EDAis a pre-step to confirmatory data analysis which delivers measures of how adequate a model is.In thistutorial the focus is on multivariate exploratory data analysis for quantitative data using linear methodsfor dimension reduction and prediction.Purely graphical multivariate tools such as 3D rotation andscatterplot matrices are discussed after having introduced the univariate and bivariate tools on which theyare based.The main tasks of multivariate exploratory data analysis are identified as'search for structure'by dimension reduction and'model selection'by comparing predictive power.Resampling is used tosupport validity,and variables selection to improve interpretability.  相似文献   

The results of unsupervised pattern recognition methods are critically dependent on the measure ofsimilarity used for clustering objects. There is little a priori information available on the relative utilityof various similarity measures. We introduce here an alternative similarity measure based on the metrictensor measure (MTM). Two standard clustering strategies are tested with the proposed similaritymeasure: hierarchical clustering and the K-median method. As data we use the ARCH obsidian data,a data set on Hungarian coal, and trace element data on Hungarian paprika. Differences from theMahalanobis distance measure are described for intraclass relations.  相似文献   

A simple algorithm for deconvolution and regression of shot-noise-limited data is illustrated in this paper.The algorithm is easily adapted to almost any model and converges to the global optimum.Multiple-component spectrum regression,spectrum deconvolution and smoothing examples are used to illustratethe algorithm.The algorithm and a method for determining uncertainties in the parameters based on theFisher information matrix are given and illustrated with three examples.An experimental example ofspectrograph grating order compensation of a diode array solar spectroradiometer is given to illustratethe use of this technique in environmental analysis.The major advantages of the EM algorithm are foundto be its stability,simplicity,conservation of data magnitude and guaranteed convergence.  相似文献   

A procedure called GOLPE is suggested in order to detect those variables which increase the predictivityof PLS models.The procedure is based on evaluating the predictive power of a number of PLS modelsbuilt by different combinations of variables selected according to a factorial design strategy.Examplesare given of the efficiency of this variable selection procedure,which shows how these predictive PLSmodels are better than those obtained by all variables and better than the corresponding ordinaryregression models.  相似文献   

Partial least squares (PLS) regression is a commonly used statistical technique for performingmultivariate calibration, especially in situations where there are more variables than samples. Choosingthe number of factors to include in a model is a decision that all users of PLS must make, but iscomplicated by the large number of empirical tests available. In most instances predictive ability is themost desired property of a PLS model and so interest has centred on making this choice based on aninternal validation process. A popular approach is the calculation of a cross-validated r~2 to gauge howmuch variance in the dependent variable can be explained from leave-one-out predictions. Using MonteCarlo simulations for different sizes of data set, the influence of chance effects on the cross-validationprocess is investigated. The results are presented as tables of critical values which are compared againstthe values of cross-validated r~2 obtained from the user's own data set. This gives a formal test forpredictive ability of a PLS model with a given number of dimensions.  相似文献   

Application of principal component analysis to Cu(II)-ethanolamine complex formation data is shown.Determination of the number of complex species is obtained from the rank of the matrix of spectral datausing either Gauss elimination or factorial analysis.Relevant information concerning species distributionversus pH is obtained from the plot of the signficant factors upsurging from the evolution of spectraltitration data.  相似文献   

Concurrent water data were collected by the U.S. Geological Survey, the U.S. Corps of Army Engineers, and the Maryland Department of Health and Mental Hygiene at the same site in order to describe present conditions and predict future environmental change in the Georges Creek basin in western Maryland. Evaluation of the data sets reveals measurement errors and weaknesses in sample design so that published complications contain significant errors. Unless these errors can be identified, policy based on such information may have unfortunate results.  相似文献   

Each eigenvector of the dispersion matrix[X]~T[X]was shown to be a partial predictor of the originaldata matrix [X],the sum of the predictions from the individual principal components being equal to theexpectance of [X].By comparing the distributions of the members of two neighbouring predictedmatrices,[X]_(1...i)and [X](1...i+1)(i.e.the sums of the first i and i+1 individual predictions respectively),it was shown that they should be indistinguishable provided that i is equal to or greater than the effectiverank of [X],and significantly different otherwise.This was confirmed by analysing the visible absorptionspectra of methyl orange and methyl red solutions as well as the Raman spectra of Na_2SO_4 and MgSO_4solutions.On the grounds of these findings,a non-parametric goodness-of-fit test for assessing theeffective rank of[X]was proposed which proved to be comparatively conservative and more robust thanmost currently used tests.  相似文献   

The UV spectra of mixture solutions consisting of tyrosine,tryptophane,phenylalanine,cystine,histidineand 3,4-dihydroxyl phenylalanine have been measured.The numbers,identities and concentrations of theamino acids in the mixtures have been determined successfully using target factor analysis.The effectsof the wavelength range and the selected sampling interval on the results are discussed.Twenty-fivesynthetic mixture samples have been analysed successfully.The average recoveries are 98·9 for Tyr,96·5for Trp,105·6 for Phe,98·1 for Cys,98·9 for His and 106·4 for Di-phe.The results obtained are ingood agreement with those obtained by the Kalman filter method.  相似文献   

This work analyses a storm that occurred in the Canary Islands early in November 1826. Through a study based on historical climate data, some of the adverse effects of the storm are described and some of the possible causes are discussed. The main goal of this work is to establish an approximate reconstruction of this historical event which will allow us to compare it to a recent meteorological event that had a great impact on the archipelago: “Tropical Storm Delta”, in November 2005. Studying and reviewing the origin of the 1826 storm verifies the hypothesis that extremely violent perturbations have not only occurred in the Canaries on other occasions, but that these past events were also more intense and had more serious consequences than Delta. Therefore, the idea that other tropical perturbations have occurred in the region of the Canary Islands before Delta is presented.  相似文献   

