Least Squares Regression (least + square_regression)

Distribution by Scientific Domains

Kinds of Least Squares Regression

  • ordinary least square regression
  • partial least square regression


  • Selected Abstracts


    Predicting pasture root density from soil spectral reflectance: field measurement

    EUROPEAN JOURNAL OF SOIL SCIENCE, Issue 1 2010
    B. H. KUSUMO
    This paper reports the development and evaluation of a field technique for in situ measurement of root density using a portable spectroradiometer. The technique was evaluated at two sites in permanent pasture on contrasting soils (an Allophanic and a Fluvial Recent soil) in the Manawatu region, New Zealand. Using a modified soil probe, reflectance spectra (350,2500 nm) were acquired from horizontal surfaces at three depths (15, 30 and 60 mm) of an 80-mm diameter soil core, totalling 108 samples for both soils. After scanning, 3-mm soil slices were taken at each depth for root density measurement and soil carbon (C) and nitrogen (N) analysis. The two soils exhibited a wide range of root densities from 1.53 to 37.03 mg dry root g,1 soil. The average root density in the Fluvial soil (13.21 mg g,1) was twice that in the Allophanic soil (6.88 mg g,1). Calibration models, developed using partial least squares regression (PLSR) of the first derivative spectra and reference data, were able to predict root density on unknown samples using a leave-one-out cross-validation procedure. The root density predictions were more accurate when the samples from the two soil types were separated (rather than grouped) to give sub-populations (n = 54) of spectral data with more similar attributes. A better prediction of root density was achieved in the Allophanic soil (r2 = 0.83, ratio prediction to deviation (RPD ) = 2.44, root mean square error of cross-validation (RMSECV ) = 1.96 mg g ,1) than in the Fluvial soil (r2 = 0.75, RPD = 1.98, RMSECV = 5.11 mg g ,1). It is concluded that pasture root density can be predicted from soil reflectance spectra acquired from field soil cores. Improved PLSR models for predicting field root density can be produced by selecting calibration data from field data sources with similar spectral attributes to the validation set. Root density and soil C content can be predicted independently, which could be particularly useful in studies examining potential rates of soil organic matter change. [source]


    Multivariate calibration of hyperspectral ,-ray energy spectra for proximal soil sensing

    EUROPEAN JOURNAL OF SOIL SCIENCE, Issue 1 2007
    R. A. Viscarra Rossel
    Summary The development of proximal soil sensors to collect fine-scale soil information for environmental monitoring, modelling and precision agriculture is vital. Conventional soil sampling and laboratory analyses are time-consuming and expensive. In this paper we look at the possibility of calibrating hyperspectral ,-ray energy spectra to predict various surface and subsurface soil properties. The spectra were collected with a proximal, on-the-go ,-ray spectrometer. We surveyed two geographically and physiographically different fields in New South Wales, Australia, and collected hyperspectral ,-ray data consisting of 256 energy bands at more than 20 000 sites in each field. Bootstrap aggregation with partial least squares regression (or bagging-PLSR) was used to calibrate the ,-ray spectra of each field for predictions of selected soil properties. However, significant amounts of pre-processing were necessary to expose the correlations between the ,-ray spectra and the soil data. We first filtered the spectra spatially using local kriging, then further de-noised, normalized and detrended them. The resulting bagging-PLSR models of each field were tested using leave-one-out cross-validation. Bagging-PLSR provided robust predictions of clay, coarse sand and Fe contents in the 0,15 cm soil layer and pH and coarse sand contents in the 15,50 cm soil layer. Furthermore, bagging-PLSR provided us with a measure of the uncertainty of predictions. This study is apparently the first to use a multivariate calibration technique with on-the-go proximal ,-ray spectrometry. Proximally sensed ,-ray spectrometry proved to be a useful tool for predicting soil properties in different soil landscapes. [source]


    ON THE OPPORTUNITY FOR SEXUAL SELECTION, THE BATEMAN GRADIENT AND THE MAXIMUM INTENSITY OF SEXUAL SELECTION

    EVOLUTION, Issue 7 2009
    Adam G. Jones
    Bateman's classic paper on fly mating systems inspired quantitative study of sexual selection but also resulted in much debate and confusion. Here, I consider the meaning of Bateman's principles in the context of selection theory. Success in precopulatory sexual selection can be quantified as a "mating differential," which is the covariance between trait values and relative mating success. The mating differential is converted into a selection differential by the Bateman gradient, which is the least squares regression of relative reproductive success on relative mating success. Hence, a complete understanding of precopulatory sexual selection requires knowledge of two equally important aspects of mating patterns: the mating differential, which requires a focus on mechanisms generating covariance between trait values and mating success, and the Bateman gradient, which requires knowledge of the genetic mating system. An upper limit on the magnitude of the selection differential on any sexually selected trait is given by the product of the standard deviation in relative mating success and the Bateman gradient. This latter view of the maximum selection differential provides a clearer focus on the important aspects of precopulatory sexual selection than other methods and therefore should be an important part of future studies of sexual selection. [source]


    Gamma regression improves Haseman-Elston and variance components linkage analysis for sib-pairs

    GENETIC EPIDEMIOLOGY, Issue 2 2004
    Mathew J. Barber
    Abstract Existing standard methods of linkage analysis for quantitative phenotypes rest on the assumptions of either ordinary least squares (Haseman and Elston [1972] Behav. Genet. 2:3,19; Sham and Purcell [2001] Am. J. Hum. Genet. 68:1527,1532) or phenotypic normality (Almasy and Blangero [1998] Am. J. Hum. Genet. 68:1198,1199; Kruglyak and Lander [1995] Am. J. Hum. Genet. 57:439,454). The limitations of both these methods lie in the specification of the error distribution in the respective regression analyses. In ordinary least squares regression, the residual distribution is misspecified as being independent of the mean level. Using variance components and assuming phenotypic normality, the dependency on the mean level is correctly specified, but the remaining residual coefficient of variation is constrained a priori. Here it is shown that these limitations can be addressed (for a sample of unselected sib-pairs) using a generalized linear model based on the gamma distribution, which can be readily implemented in any standard statistical software package. The generalized linear model approach can emulate variance components when phenotypic multivariate normality is assumed (Almasy and Blangero [1998] Am. J. Hum Genet. 68: 1198,1211) and is therefore more powerful than ordinary least squares, but has the added advantage of being robust to deviations from multivariate normality and provides (often overlooked) model-fit diagnostics for linkage analysis. Genet Epidemiol 26:97,107, 2004. © 2004 Wiley-Liss, Inc. [source]


    Can a publicly funded home care system successfully allocate service based on perceived need rather than socioeconomic status?

    HEALTH & SOCIAL CARE IN THE COMMUNITY, Issue 2 2007
    A Canadian experience
    Abstract The present quantitative study evaluates the degree to which socioeconomic status (SES), as opposed to perceived need, determines utilisation of publicly funded home care in Ontario, Canada. The Registered Persons Data Base of the Ontario Health Insurance Plan was used to identify the age, sex and place of residence for all Ontarians who had coverage for the complete calendar year 1998. Utilisation was characterised in two dimensions: (1) propensity , the probability that an individual received service, which was estimated using a multinomial logit equation; and (2) intensity , the amount of service received, conditional on receipt. Short- and long-term service intensity were modelled separately using ordinary least squares regression. Age, sex and co-morbidity were the best predictors (P < 0.0001) of whether or not an individual received publicly funded home care as well as how much care was received, with sicker individuals having increased utilisation. The propensity and intensity of service receipt increased with lower SES (P < 0.0001), and decreased with the proportion of recent immigrants in the region (P < 0.0001), after controlling for age, sex and co-morbidity. Although the allocation of publicly funded home care service was primarily based on perceived need rather than ability to pay, barriers to utilisation for those from areas with a high proportion of recent immigrants were identified. Future research is needed to assess whether the current mix and level of publicly funded resources are indeed sufficient to offset the added costs associated with the provision of high-quality home care. [source]


    Fatty acid composition, antioxidants and lipid oxidation in chicken breasts from different production regimes

    INTERNATIONAL JOURNAL OF FOOD SCIENCE & TECHNOLOGY, Issue 4 2004
    Kishowar Jahan
    Summary Chicken breast from nine products and from the following production regimes: conventional (chilled and frozen), organic and free range, were analysed for fatty acid composition of total lipids, preventative and chain breaking antioxidant contents and lipid oxidation during 5 days of sub-ambient storage following purchase. Total lipids were extracted with an optimal amount of a cold chloroform methanol solvent. Lipid compositions varied, but there were differences between conventional and organic products in their contents of total polyunsaturated fatty acids and n-3 and n-6 fatty acids and n-6:n-3 ratio. Of the antioxidants, , -tocopherol content was inversely correlated with lipid oxidation. The antioxidant enzyme activities of catalase, glutathione peroxidase and glutathione reductase varied between products. Modelling with partial least squares regression showed no overall relationship between total antioxidants and lipid data, but certain individual antioxidants showed a relationship with specific lipid fractions. [source]


    Species,area relationships in Mediterranean-climate plant communities

    JOURNAL OF BIOGEOGRAPHY, Issue 11 2003
    Jon E. Keeley
    Abstract Aim To determine the best-fit model of species,area relationships for Mediterranean-type plant communities and evaluate how community structure affects these species,area models. Location Data were collected from California shrublands and woodlands and compared with literature reports for other Mediterranean-climate regions. Methods The number of species was recorded from 1, 100 and 1000 m2 nested plots. Best fit to the power model or exponential model was determined by comparing adjusted r2 values from the least squares regression, pattern of residuals, homoscedasticity across scales, and semi-log slopes at 1,100 m2 and 100,1000 m2. Dominance,diversity curves were tested for fit to the lognormal model, MacArthur's broken stick model, and the geometric and harmonic series. Results Early successional Western Australia and California shrublands represented the extremes and provide an interesting contrast as the exponential model was the best fit for the former, and the power model for the latter, despite similar total species richness. We hypothesize that structural differences in these communities account for the different species,area curves and are tied to patterns of dominance, equitability and life form distribution. Dominance,diversity relationships for Western Australian heathlands exhibited a close fit to MacArthur's broken stick model, indicating more equitable distribution of species. In contrast, Californian shrublands, both postfire and mature stands, were best fit by the geometric model indicating strong dominance and many minor subordinate species. These regions differ in life form distribution, with annuals being a major component of diversity in early successional Californian shrublands although they are largely lacking in mature stands. Both young and old Australian heathlands are dominated by perennials, and annuals are largely absent. Inherent in all of these ecosystems is cyclical disequilibrium caused by periodic fires. The potential for community reassembly is greater in Californian shrublands where only a quarter of the flora resprout, whereas three quarters resprout in Australian heathlands. Other Californian vegetation types sampled include coniferous forests, oak savannas and desert scrub, and demonstrate that different community structures may lead to a similar species,area relationship. Dominance,diversity relationships for coniferous forests closely follow a geometric model whereas associated oak savannas show a close fit to the lognormal model. However, for both communities, species,area curves fit a power model. The primary driver appears to be the presence of annuals. Desert scrub communities illustrate dramatic changes in both species diversity and dominance,diversity relationships in high and low rainfall years, because of the disappearance of annuals in drought years. Main conclusions Species,area curves for immature shrublands in California and the majority of Mediterranean plant communities fit a power function model. Exceptions that fit the exponential model are not because of sampling error or scaling effects, rather structural differences in these communities provide plausible explanations. The exponential species,area model may arise in more than one way. In the highly diverse Australian heathlands it results from a rapid increase in species richness at small scales. In mature California shrublands it results from very depauperate richness at the community scale. In both instances the exponential model is tied to a preponderance of perennials and paucity of annuals. For communities fit by a power model, coefficients z and log c exhibit a number of significant correlations with other diversity parameters, suggesting that they have some predictive value in ecological communities. [source]


    Robust partial least squares regression: Part II, new algorithm and benchmark studies

    JOURNAL OF CHEMOMETRICS, Issue 1 2008
    Uwe Kruger
    Abstract This paper presents the second part of the work on robust partial least squares (RPLS) regression and develops a new RPLS algorithm based on the concept laid out in Part I. The paper also contrasts the new algorithm with existing work using two simulation examples. This comparison highlights (i) the impact of the flaws in existing RPLS work and (ii) the compromised sensitivity resulting from introducing simplifications to the determination of the Stahel,Donoho estimator (SDE). The paper finally presents an evaluation of the computational complexity of RPLS algorithms and examines the impact of the signal-to-noise ratio (SNR) upon the sensitivity of detecting outliers. The third part of this work will examine practical aspects of RPLS applications based on the analysis of experimental data. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Theory of net analyte signal vectors in inverse regression

    JOURNAL OF CHEMOMETRICS, Issue 12 2003
    Rasmus Bro
    Abstract The net analyte signal and the net analyte signal vector are useful measures in building and optimizing multivariate calibration models. In this paper a theory for their use in inverse regression is developed. The theory of net analyte signal was originally derived from classical least squares in spectral calibration where the responses of all pure analytes and interferents are assumed to be known. However, in chemometrics, inverse calibration models such as partial least squares regression are more abundant and several tools for calculating the net analyte signal in inverse regression models have been proposed. These methods yield different results and most do not provide results that are in accordance with the chosen calibration model. In this paper a thorough development of a calibration-specific net analyte signal vector is given. This definition turns out to be almost identical to the one recently suggested by Faber (Anal. Chem. 1998; 70: 5108,5110). A required correction of the net analyte signal in situations with negative predicted responses is also discussed. Copyright © 2004 John Wiley & Sons, Ltd. [source]


    Non-parametric statistical methods for multivariate calibration model selection and comparison,

    JOURNAL OF CHEMOMETRICS, Issue 12 2003
    Edward V. Thomas
    Abstract Model selection is an important issue when constructing multivariate calibration models using methods based on latent variables (e.g. partial least squares regression and principal component regression). It is important to select an appropriate number of latent variables to build an accurate and precise calibration model. Inclusion of too few latent variables can result in a model that is inaccurate over the complete space of interest. Inclusion of too many latent variables can result in a model that produces noisy predictions through incorporation of low-order latent variables that have little or no predictive value. Commonly used metrics for selecting the number of latent variables are based on the predicted error sum of squares (PRESS) obtained via cross-validation. In this paper a new approach for selecting the number of latent variables is proposed. In this new approach the prediction errors of individual observations (obtained from cross-validation) are compared across models incorporating varying numbers of latent variables. Based on these comparisons, non-parametric statistical methods are used to select the simplest model (least number of latent variables) that provides prediction quality that is indistinguishable from that provided by more complex models. Unlike methods based on PRESS, this new approach is robust to the effects of anomalous observations. More generally, the same approach can be used to compare the performance of any models that are applied to the same data set where reference values are available. The proposed methodology is illustrated with an industrial example involving the prediction of gasoline octane numbers from near-infrared spectra. Published in 2004 by John Wiley & Sons, Ltd. [source]


    Robust methods for partial least squares regression

    JOURNAL OF CHEMOMETRICS, Issue 10 2003
    M. Hubert
    Abstract Partial least squares regression (PLSR) is a linear regression technique developed to deal with high-dimensional regressors and one or several response variables. In this paper we introduce robustified versions of the SIMPLS algorithm, this being the leading PLSR algorithm because of its speed and efficiency. Because SIMPLS is based on the empirical cross-covariance matrix between the response variables and the regressors and on linear least squares regression, the results are affected by abnormal observations in the data set. Two robust methods, RSIMCD and RSIMPLS, are constructed from a robust covariance matrix for high-dimensional data and robust linear regression. We introduce robust RMSECV and RMSEP values for model calibration and model validation. Diagnostic plots are constructed to visualize and classify the outliers. Several simulation results and the analysis of real data sets show the effectiveness and robustness of the new approaches. Because RSIMPLS is roughly twice as fast as RSIMCD, it stands out as the overall best method. Copyright © 2003 John Wiley & Sons, Ltd. [source]


    A robust PCR method for high-dimensional regressors

    JOURNAL OF CHEMOMETRICS, Issue 8-9 2003
    Mia Hubert
    Abstract We consider the multivariate calibration model which assumes that the concentrations of several constituents of a sample are linearly related to its spectrum. Principal component regression (PCR) is widely used for the estimation of the regression parameters in this model. In the classical approach it combines principal component analysis (PCA) on the regressors with least squares regression. However, both stages yield very unreliable results when the data set contains outlying observations. We present a robust PCR (RPCR) method which also consists of two parts. First we apply a robust PCA method for high-dimensional data on the regressors, then we regress the response variables on the scores using a robust regression method. A robust RMSECV value and a robust R2 value are proposed as exploratory tools to select the number of principal components. The prediction error is also estimated in a robust way. Moreover, we introduce several diagnostic plots which are helpful to visualize and classify the outliers. The robustness of RPCR is demonstrated through simulations and the analysis of a real data set. Copyright © 2003 John Wiley & Sons, Ltd. [source]


    Real-time forecasting of photosmog episodes: the Naples case study

    JOURNAL OF CHEMOMETRICS, Issue 7 2001
    A. Riccio
    Abstract In this paper we analysed the ozone time series data collected by the local monitoring network in the Naples urban area (southern Italy) during the spring/summer period of 1996. Our aim was to identify a reliable and effective model that could be used for the real-time forecasting of photosmog episodes. We studied the applicability of seasonal autoregressive integrated moving average models with some exogenous variables (ARIMAX) to our case study. The choice of exogenous variables,temperature, [NO2]/[NO] ratio and wind speed,was based on physical reasoning. The forecasting performance of all models was evaluated with data not used in model development, by means of an array of statistical indices: the comparison between observed and forecast means and standard deviations; intercept and slope of a least squares regression of forecast variable on observed variable; mean absolute and root mean square errors; and 95% confidence limits of forecast variable. The assessment of all models was also based on their tendency to forecast critical episodes. It was found that the model using information from the temperature data set to predict peak ozone levels gives satisfactory results, about 70% of critical episodes being correctly predicted by the 24,h ahead forecast function. Copyright © 2001 John Wiley & Sons, Ltd. [source]


    PRELIMINARY EVALUATION OF THE APPLICATION OF THE FTIR SPECTROSCOPY TO CONTROL THE GEOGRAPHIC ORIGIN AND QUALITY OF VIRGIN OLIVE OILS

    JOURNAL OF FOOD QUALITY, Issue 4 2007
    ALESSANDRA BENDINI
    ABSTRACT A rapid Fourier transform infrared (FTIR) attenuated total reflectance spectroscopic method was applied to determine qualitative parameters such as free fatty acid (FFA) content and the peroxide value (POV) in virgin olive oils. Calibration models were constructed using partial least squares regression on a large number of virgin olive oil samples. The best results (R2 = 0.955, root mean square error in cross validation [RMSECV] = 0.15) to evaluate FFA content expressed in oleic acid % (w/w) were obtained considering a calibration range from 0.2 to 9.2% of FFA relative to 190 samples. For POV determination, the result obtained, built on 80 olive oil samples with a calibration range from 11.1 to 49.7 meq O2/kg of oil, was not satisfactory (R2 = 0.855, RMSECV = 3.96). We also investigated the capability of FTIR spectroscopy, in combination with multivariate analysis, to distinguish virgin olive oils based on geographic origin. The spectra of 84 monovarietal virgin olive oil samples from eight Italian regions were collected and elaborated by principal component analysis (PCA), considering the fingerprint region. The results were satisfactory and could successfully discriminate the majority of samples coming from the Emilia Romagna, Sardinian and Sicilian regions. Moreover, the explained variance from this PCA was higher than 96%. PRACTICAL APPLICATIONS The verification of the declared origin or the determination of the origin of an unidentified virgin olive oil is a challenging problem. In this work, we have studied the applicability of Fourier transform infrared coupled with multivariate statistical analysis to discriminate the geographic origin of virgin olive oil samples from different Italian regions. [source]


    Relation between Developmental Stage, Sensory Properties, and Volatile Content of Organically and Conventionally Grown Pac Choi (Brassica rapa,var.

    JOURNAL OF FOOD SCIENCE, Issue 4 2010
    Mei Qing Choi)
    ABSTRACT:, This study was conducted to identify and quantify the sensory characteristics and chemical profile of organically and conventionally grown pac choi (Brassica rapa,var. Mei Qing Choi), also called bok choy, at 3 stages of growth (2.5, 4.5, and 6.5 wk). Sensory and instrumental data were correlated using partial least squares regression. Pac choi was grown in late spring. Descriptive sensory analysis was conducted by a highly trained panel and compounds were identified and quantified using a gas chromatograph/mass spectrometer. The findings of the study indicate that the differences in sensory characteristics and chemical profiles among stages of growth are more substantial than the differences between organic and conventional production. Green-unripe, musty/earthy, lettuce, and sweet flavors are representative in pac choi at early stages of growth. When older, pac choi has higher intensities of green-grassy/leafy, bitter, cabbage, and sulfur flavors that are associated with the increase of (Z)-3-hexen-1-ol, octyl acetate, 1-nonanol, 2-decanone, 1-penten-3-ol, linalool, camphor, menthol, isobornyl acetate, geranylacetone, and cedrol compounds. Conventional pac choi was higher than organic pac choi in green overall, bitter, and soapy flavors only at 2.5 wk of age. This may be associated with the presence of (Z)-3-hexenal, 2-hexyn-1-ol, and (E)-2-hexenal compounds. Practical Application: The increased popularity of organic production has amplified the need for research that will help in understanding how this production system affects the final quality of food products. This study suggests that the stage of development has a much larger impact on sensory quality than organic or conventional growing of pac choi. Findings from this study promote consumer choice by showing that comparable sensory quality can be obtained using either production system, making the ultimate choice not only based on sensory quality but consumer choice related to environmental beliefs or economics. [source]


    Relating Descriptive Sensory Analysis to Gas Chromatography/Olfactometry Ratings of Fresh Strawberries Using Partial Least Squares Regression

    JOURNAL OF FOOD SCIENCE, Issue 7 2004
    K.F. Schulbach
    ABSTRACT: Sensory properties of 5 strawberry varieties were related to gas chromatography/olfactometry (GC/ O) analysis using partial least squares regression (PLS). The sour and green sensory aspects were strongly associated with titratable acidity, hexanal, and E-2 hexenal. The caramel/sweet character was differentiated from the strawberry/fruity character by its stronger association with Furaneol, which had a high score in the 2nd PLS dimension. The sensory scores for peach and the GC/O ratings for the peach-like lactones were also associated. The fruity sensory scores and the floral sensory scores were not well correlated with compounds having fruity or floral character. This lack of relationship could partially be explained by covariance among the sensory ratings for the samples. [source]


    Monitoring the film coating unit operation and predicting drug dissolution using terahertz pulsed imaging

    JOURNAL OF PHARMACEUTICAL SCIENCES, Issue 12 2009
    Louise Ho
    Abstract Understanding the coating unit operation is imperative to improve product quality and reduce output risks for coated solid dosage forms. Three batches of sustained-release tablets coated with the same process parameters (pan speed, spray rate, etc.) were subjected to terahertz pulsed imaging (TPI) analysis followed by dissolution testing. Mean dissolution times (MDT) from conventional dissolution testing were correlated with terahertz waveforms, which yielded a multivariate, partial least squares regression (PLS) model with an R2 of 0.92 for the calibration set and 0.91 for the validation set. This two-component, PLS model was built from batch I that was coated in the same environmental conditions (air temperature, humidity, etc.) to that of batch II but at different environmental conditions from batch III. The MDTs of batch II was predicted in a nondestructive manner with the developed PLS model and the accuracy of the predicted values were subsequently validated with conventional dissolution testing and found to be in good agreement. The terahertz PLS model was also shown to be sensitive to changes in the coating conditions, successfully identifying the larger coating variability in batch III. In this study, we demonstrated that TPI in conjunction with PLS analysis could be employed to assist with film coating process understanding and provide predictions on drug dissolution. © 2009 Wiley-Liss, Inc. and the American Pharmacists Association J Pharm Sci 98:4866,4876, 2009 [source]


    Multivariate calibration of covalent aggregate fraction to the raman spectrum of regular human insulin

    JOURNAL OF PHARMACEUTICAL SCIENCES, Issue 9 2008
    Connie M. Gryniewicz
    Abstract Insulin aggregates were prepared by exposing samples of formulated regular human insulin to agitation at 60°C. Aliquots were drawn from the samples periodically over a time range spanning 192 h, and their aggregate compositions were determined with size exclusion chromatography. The complete data set was composed of 39 separate aliquots. The Raman spectra of three separate 10 µL volumes from each aliquot were measured using the drop-coat deposition Raman (DCDR) method. The spectra were calibrated to aggregate composition by partial least squares regression (PLS), resulting in linear calibration (R2,=,0.997) with a root mean squared error of calibration (RMSEC) of 1.3% and a root mean squared error of cross validation (RMSECV) of 5.1% in aggregate composition. Though the time required for aggregates to form under stressed conditions showed substantial sample-to-sample variation, the correlation between aggregate composition and Raman spectrum was remarkably consistent, indicating that Raman spectroscopy may be a viable screening method for aggregation of protein drugs. © 2008 Wiley-Liss, Inc. and the American Pharmacists Association J Pharm Sci 97:3727,3734, 2008 [source]


    THE USE OF NEAR INFRARED REFLECTANCE SPECTROMETRY FOR CHARACTERIZATION OF BROWN ALGAL TISSUE,

    JOURNAL OF PHYCOLOGY, Issue 5 2010
    Kyra B. Hay
    Measuring qualitative traits of plant tissue is important to understand how plants respond to environmental change and biotic interactions. Near infrared reflectance spectrometry (NIRS) is a cost-, time-, and sample-effective method of measuring chemical components in organic samples commonly used in the agricultural and pharmaceutical industries. To assess the applicability of NIRS to measure the ecologically important tissue traits of carbon, nitrogen, and phlorotannins (secondary metabolites) in brown algae, we developed NIRS calibration models for these constituents in dried Sargassum flavicans (F. K. Mertens) C. Agardh tissue. We then tested if the developed NIRS models could detect changes in the tissue composition of S. flavicans induced by experimental manipulation of temperature and nutrient availability. To develop the NIRS models, we used partial least squares regression to determine the statistical relationship between trait values determined in laboratory assays and the NIRS spectral data of S. flavicans calibration samples. Models with high predictive power were developed for all three constituents that successfully detected changes in carbon, nitrogen, and phlorotannin content in the experimentally manipulated S. flavicans tissue. Phlorotannin content in S. flavicans was inversely related to nitrogen availability, and nitrogen, temperature, and tissue age interacted to significantly affect phlorotannin content, demonstrating the importance of studies that investigate these three variables simultaneously. Given the speed of analysis, accuracy, small tissue requirements, and ability to measure multiple traits simultaneously without consuming the sample tissue, NIRS is a valuable alternative to traditional methods for determining algal tissue traits, especially in studies where tissue is limited. [source]


    CONSUMER EVALUATION OF MILK AUTHENTICITY EXPLAINED BOTH BY CONSUMER BACKGROUND CHARACTERISTICS AND BY PRODUCT SENSORY DESCRIPTORS

    JOURNAL OF SENSORY STUDIES, Issue 6 2007
    L.W. FRANDSEN
    ABSTRACT Consumer authenticity tests were used to elicit consumer response to the influence of fodder and storage time on the flavor of cow milk. A panel of professional tasters was used to provide a descriptive profile of the sensory characteristics of the milk. Consumer background characteristics were collected through a questionnaire concerning demographic and consumption pattern variables as well as assessments using two attitude scales: a modified food neophobia questions and a set of milk xenophobia questions. A multivariate data analytical method (L-shaped partial least squares regression) was used to model the variation in the authenticity evaluation simultaneously from two different sources: the storage/feed effects as described by the sensory panel and the consumer background variables. Results showed that milk samples with storage/feed characteristics were evaluated as "foreign" (not Danish) by some segments of the consumers. PRACTICAL APPLICATIONS Very small differences in a food product, here milk, sometimes cannot be discerned by standard sensory methods. The test in this article , authenticity test , is able to assess such differences. In this article, it is studied whether there are influences of the consumer on the results of the authenticity test, to see if this test is broadly applicable. With respect to milk, a number of effects appear that have an effect on the acceptance of milk as a result of fodder and storage time. These factors can be of use for milk producers, and the differences in the acceptance of the products between the consumers may help milk producers to aim products to consumer segments. [source]


    Sparse partial least squares regression for simultaneous dimension reduction and variable selection

    JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 1 2010
    Hyonho Chun
    Summary., Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. [source]


    Colour characteristics of honeys as influenced by pollen grain content: a multivariate study

    JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE, Issue 4 2004
    Anass Terrab
    Abstract A chromatic analysis by tristimulus colorimetry and a pollen analysis (pollen grains contained in each honey sample, considering their volume and geometrical shape) were carried out on 33 Eucalyptus unifloral honeys; the colour of the pollen grains was also considered. Multiple linear regression (MLR) and partial least squares regression (PLSR) were used to establish equations relating the chromatic variables to the pollen data, ie number and morphology of the pollen grains, thus allowing the prediction of the ultimate colour from the botanical characteristics. The results obtained show that lightness (L*) is significantly (p < 0.05) related to the pollen type Olea europaea; on the other hand, the variable that better relates to the chroma is the Zea mays pollen type. Copyright © 2004 Society of Chemical Industry [source]


    Mathematical modelling of the heat inactivation of trypsin inhibitors in soymilk at 121,154,°C

    JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE, Issue 3 2002
    Kin-Chor Kwok
    Abstract Response surface methodology (RSM) was used to investigate the effects of processing temperature and time on the inactivation of trypsin inhibitors (TI) in soymilk. The factorial experimental design consisted of four levels of temperature and six levels of time in a temperature range of 121,154,°C and a time interval of 10,90,s. A quadratic polynomial equation, relating log(% TI retained) as a function of heating time and temperature, was satisfactorily fitted to the experimental data by least squares regression with r2 (correlationcoefficient),=,0.959. Within the range of heating times investigated, TI in soymilk was satisfactorily destroyed to 10% retained at 143 and 154,°C with 62 and 29,s heating time respectively. © 2002 Society of Chemical Industry [source]


    Rejecting the mean: Estimating the response of fen plant species to environmental factors by non-linear quantile regression

    JOURNAL OF VEGETATION SCIENCE, Issue 4 2005
    Henning K. Schröder
    Abstract Question: Is quantile regression an appropriate statistical approach to estimate the response of fen species to single environmental factors? Background: Data sets in vegetation field studies are often characterized by a large number of zeros and they are generally incomplete in respect to the factors which possibly influence plant species distribution. Thus, it is problematic to relate plant species abundance to single environmental factors by the ordinary least squares regression technique of the conditional mean. Location: Riparian herbaceous fen in central Jutland (Denmark). Methods: Semi-parametric quantile regression was used to estimate the response of 18 plant species to six environmental factors, 95% regression quantiles were chosen to reduce the impact of multiple unmeasured factors on the regression analyses. Results of 95% quantile regression and ordinary least squares regression were compared. Results: The standard regression of the conditional mean underestimated the rates of change of species cover due to the selected factor in comparison to 95% regression quantiles. The fitted response curves indicated a general broad tolerance of the studied fen species to different flooding durations but a narrower range concerning groundwater amplitude. The cover of all species was related to soil exchangeable phosphate and base-richness. A relationship between soil exchangeable potassium and species cover was only found for 11 species. Conclusion: Considering the characteristics of data sets in vegetation science, non-linear quantile regression is a useful method for gradient analyses. [source]


    In-Line Monitoring of Vinyl Chloride Suspension Polymerization with Near-Infrared Spectroscopy, 1 , Analysis of Morphological Properties

    MACROMOLECULAR REACTION ENGINEERING, Issue 1 2010
    João Miguel de Faria Jr.
    Abstract It is demonstrated that during suspension polymerizations it is possible to monitor morphological characteristics of PVC resins such as bulk density, cold plasticizer absorption and average particle diameter in-line and in real time using NIR spectroscopy. NIR spectra are obtained at different experimental conditions, showing that the spectra are sensitive to changes in the PVC properties. Standard mathematical procedures (partial least squares regression) are used to build empirical models and correlate the morphological properties with the obtained NIR spectra, allowing for monitoring of the PVC morphology in-line and in real time. [source]


    Diversity in fertility patterns in Guatemala

    POPULATION, SPACE AND PLACE (PREVIOUSLY:-INT JOURNAL OF POPULATION GEOGRAPHY), Issue 6 2006
    Sofie De Broe
    Abstract This study investigates urban and rural fertility trends in Guatemala up to 2002. It also aims to establish, using the theory of diffusion as its theoretical framework, the extent to which ethnicity and ethnic diversity are associated with geographical patterns in local-level fertility after controlling for socio-economic indicators. Data from the Demographic and Health Surveys of 1987, 1995,96 and 1998,99, the National Maternal and Child Health Survey of 2002 and the Census of 2002 were used. P/F ratios were calculated and used as an analytical tool and quality control measure of the data in order to establish the timing of changes in fertility patterns as measured by age-specific fertility rates (ASFRs) based on exact exposure in four-year periods from 1972 to 2002. Finally, the 2002 census data were used to analyse and model fertility at the municipio- level using ordinary least squares regression. The results suggest a steady but very slow decline in fertility from 1972 until the mid-1990s. Both the P/F ratios and ASFRs calculated using the Maternal and Child Health Survey and Census of 2002 show a sharp decline in fertility since 1998. The regression results for the census data suggest an independent and significant effect of ,proportion of indigenous people' and an almost significant effect of ethnic diversity on fertility at the municipio -level. The very slow decline in fertility in Guatemala until fairly recently can be attributed to the fact that Guatemala has been lagging behind in terms of socio-economic development and the additional challenge of having a culturally very diverse and segregated population, preventing the spread of modern reproductive ideas and behaviour. The accelerated fertility decline since the end of the 1990s seems likely to be associated with the widespread availability and increased uptake of family planning following declining fertility desires among its indigenous population. Copyright © 2006 John Wiley & Sons, Ltd. [source]


    Changes in cod muscle proteins during frozen storage revealed by proteome analysis and multivariate data analysis

    PROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 5 2006
    Inger V. H. Kjærsgård Dr.
    Abstract Multivariate data analysis has been combined with proteomics to enhance the recovery of information from 2-DE of cod muscle proteins during different storage conditions. Proteins were extracted according to 11 different storage conditions and samples were resolved by 2-DE. Data generated by 2-DE was subjected to principal component analysis (PCA) and discriminant partial least squares regression (DPLSR). Applying PCA to 2-DE data revealed the samples to form groups according to frozen storage time, whereas differences due to different storage temperatures or chilled storage in modified atmosphere packing did not lead to distinct changes in protein pattern. Applying DPLSR to the 2-DE data enabled the selection of protein spots critical for differentiation between 3 and 6,months frozen storage with 12,months frozen storage. Some of these protein spots have been identified by MS/MS, revealing myosin light chain 1, 2 and 3, triose-phosphate isomerase, glyceraldehyde-3-phosphate dehydrogenase, aldolase A and two ,-actin fragments, and a nuclease diphosphate kinase B fragment to change in concentration, during frozen storage. Application of proteomics, multivariate data analysis and MS/MS to analyse protein changes in cod muscle proteins during storage has revealed new knowledge on the issue and enables a better understanding of biochemical processes occurring. [source]


    Inflation of Type I error rate in multiple regression when independent variables are measured with error

    THE CANADIAN JOURNAL OF STATISTICS, Issue 1 2009
    Jerry Brunner
    MSC 2000: Primary 62J99; secondary 62H15 Abstract When independent variables are measured with error, ordinary least squares regression can yield parameter estimates that are biased and inconsistent. This article documents an inflation of Type I error rate that can also occur. In addition to analytic results, a large-scale Monte Carlo study shows unacceptably high Type I error rates under circumstances that could easily be encountered in practice. A set of smaller-scale simulations indicate that the problem applies to various types of regression and various types of measurement error. The Canadian Journal of Statistics 37: 33-46; 2009 © 2009 Statistical Society of Canada Lorsque les variables indépendantes sont mesurées avec erreur, la régression des moindres carrés ordinaires peut conduire à une estimation biaisée et incohérente des paramètres. Cet article montre qu'un accroissement de l'erreur de type I peut aussi se produire. En plus de résultats analytiques, une étude par simulations Monte-Carlo de grande envergure montre que, dans certaines conditions que nous pouvons rencontrer facilement en pratique, l'erreur de type I peut être trop élevée. Une autre étude de Monte-Carlo de moindre envergure suggère que ce problème se rencontre aussi dans plusieurs types de régression et différents types d'erreur de mesure. La revue canadienne de statistique 37: 33-46; 2009 © 2009 Société statistique du Canada [source]


    The location of white matter lesions and gait,A voxel-based study

    ANNALS OF NEUROLOGY, Issue 2 2010
    Velandai Srikanth PhD
    Little is known about the influence of cerebral white matter lesion (WML) location on gait. We applied partial least squares regression in brain magnetic resonance imaging scans (n = 385) to evaluate which WML voxel systems were independently associated with a composite gait score and identified affected tracts using a diffusion tensor imaging template. Bilateral frontal and periventricular WML-affected voxels corresponding to major anterior projection fibers (thalamic radiations, corticofugal motor tracts) and adjacent association fibers (corpus callosum, superior fronto-occipital fasciculus, short association fibers) showed the greatest covariance with poorer gait. WMLs probably contribute to age-related gait decline by disconnecting motor networks served by these tracts. ANN NEUROL 2010;67:265,269 [source]


    QSAR Modeling of a Set of Pyrazinoate Esters as Antituberculosis Prodrugs

    ARCHIV DER PHARMAZIE, Issue 2 2010
    João P. S. Fernandes
    Abstract Tuberculosis is an infection caused mainly by Mycobacterium tuberculosis. A first-line antimycobacterial drug is pyrazinamide (PZA), which acts partially as a prodrug activated by a pyrazinamidase releasing the active agent, pyrazinoic acid (POA). As pyrazinoic acid presents some difficulty to cross the mycobacterial cell wall, and also the pyrazinamide-resistant strains do not express the pyrazinamidase, a set of pyrazinoic acid esters have been evaluated as antimycobacterial agents. In this work, a QSAR approach was applied to a set of forty-three pyrazinoates against M. tuberculosis ATCC 27294, using genetic algorithm function and partial least squares regression (WOLF 5.5 program). The independent variables selected were the Balaban index (J), calculated n -octanol/water partition coefficient (ClogP), van-der-Waals surface area, dipole moment, and stretching-energy contribution. The final QSAR model (N = 32, r2 = 0.68, q2 = 0.59, LOF = 0.25, and LSE = 0.19) was fully validated employing leave- N -out cross-validation and y -scrambling techniques. The test set (N = 11) presented an external prediction power of 73%. In conclusion, the QSAR model generated can be used as a valuable tool to optimize the activity of future pyrazinoic acid esters in the designing of new antituberculosis agents. [source]