Regressors

Distribution by Scientific Domains
Distribution within Business, Economics, Finance and Accounting


Selected Abstracts


Identification and Estimation of Regression Models with Misclassification

ECONOMETRICA, Issue 3 2006
Aprajit Mahajan
This paper studies the problem of identification and estimation in nonparametric regression models with a misclassified binary regressor where the measurement error may be correlated with the regressors. We show that the regression function is nonparametrically identified in the presence of an additional random variable that is correlated with the unobserved true underlying variable but unrelated to the measurement error. Identification for semiparametric and parametric regression functions follows straightforwardly from the basic identification result. We propose a kernel estimator based on the identification strategy, derive its large sample properties, and discuss alternative estimation procedures. We also propose a test for misclassification in the model based on an exclusion restriction that is straightforward to implement. [source]


Cross Section and Panel Data Estimators for Nonseparable Models with Endogenous Regressors

ECONOMETRICA, Issue 4 2005
Joseph G. Altonji
We propose two new methods for estimating models with nonseparable errors and endogenous regressors. The first method estimates a local average response. One estimates the response of the conditional mean of the dependent variable to a change in the explanatory variable while conditioning on an external variable and then undoes the conditioning. The second method estimates the nonseparable function and the joint distribution of the observable and unobservable explanatory variables. An external variable is used to impose an equality restriction, at two points of support, on the conditional distribution of the unobservable random term given the regressor and the external variable. Our methods apply to cross sections, but our lead examples involve panel data cases in which the choice of the external variable is guided by the assumption that the distribution of the unobservable variables is exchangeable in the values of the endogenous variable for members of a group. [source]


Estimation of Nonlinear Models with Measurement Error

ECONOMETRICA, Issue 1 2004
Susanne M. Schennach
This paper presents a solution to an important econometric problem, namely the root n consistent estimation of nonlinear models with measurement errors in the explanatory variables, when one repeated observation of each mismeasured regressor is available. While a root n consistent estimator has been derived for polynomial specifications (see Hausman, Ichimura, Newey, and Powell (1991)), such an estimator for general nonlinear specifications has so far not been available. Using the additional information provided by the repeated observation, the suggested estimator separates the measurement error from the "true" value of the regressors thanks to a useful property of the Fourier transform: The Fourier transform converts the integral equations that relate the distribution of the unobserved "true" variables to the observed variables measured with error into algebraic equations. The solution to these equations yields enough information to identify arbitrary moments of the "true," unobserved variables. The value of these moments can then be used to construct any estimator that can be written in terms of moments, including traditional linear and nonlinear least squares estimators, or general extremum estimators. The proposed estimator is shown to admit a representation in terms of an influence function, thus establishing its root n consistency and asymptotic normality. Monte Carlo evidence and an application to Engel curve estimation illustrate the usefulness of this new approach. [source]


Correction for pulse height variability reduces physiological noise in functional MRI when studying spontaneous brain activity

HUMAN BRAIN MAPPING, Issue 2 2010
Petra J. van Houdt
Abstract EEG correlated functional MRI (EEG-fMRI) allows the delineation of the areas corresponding to spontaneous brain activity, such as epileptiform spikes or alpha rhythm. A major problem of fMRI analysis in general is that spurious correlations may occur because fMRI signals are not only correlated with the phenomena of interest, but also with physiological processes, like cardiac and respiratory functions. The aim of this study was to reduce the number of falsely detected activated areas by taking the variation in physiological functioning into account in the general linear model (GLM). We used the photoplethysmogram (PPG), since this signal is based on a linear combination of oxy- and deoxyhemoglobin in the arterial blood, which is also the basis of fMRI. We derived a regressor from the variation in pulse height (VIPH) of PPG and added this regressor to the GLM. When this regressor was used as predictor it appeared that VIPH explained a large part of the variance of fMRI signals acquired from five epilepsy patients and thirteen healthy volunteers. As a confounder VIPH reduced the number of activated voxels by 30% for the healthy volunteers, when studying the generators of the alpha rhythm. Although for the patients the number of activated voxels either decreased or increased, the identification of the epileptogenic zone was substantially enhanced in one out of five patients, whereas for the other patients the effects were smaller. In conclusion, applying VIPH as a confounder diminishes physiological noise and allows a more reliable interpretation of fMRI results. Hum Brain Mapp, 2010. © 2009 Wiley-Liss, Inc. [source]


fMRI analysis for motor paradigms using EMG-based designs: A validation study

HUMAN BRAIN MAPPING, Issue 11 2007
Anne-Fleur van Rootselaar
Abstract The goal of the present validation study is to show that continuous surface EMG recorded simultaneously with 3T fMRI can be used to identify local brain activity related to (1) motor tasks, and to (2) muscle activity independently of a specific motor task, i.e. spontaneous (abnormal) movements. Five healthy participants performed a motor task, consisting of posture (low EMG power), and slow (medium EMG power) and fast (high EMG power) wrist flexion,extension movements. Brain activation maps derived from a conventional block design analysis (block-only design) were compared with brain activation maps derived using EMG-based regressors: (1) using the continuous EMG power as a single regressor of interest (EMG-only design) to relate motor performance and brain activity, and (2) using EMG power variability as an additional regressor in the fMRI block design analysis to relate movement variability and brain activity (mathematically) independent of the motor task. The agreement between the identified brain areas for the block-only design and the EMG-only design was excellent for all participants. Additionally, we showed that EMG power variability correlated well with activity in brain areas known to be involved in movement modulation. These innovative EMG-fMRI analysis techniques will allow the application of novel motor paradigms. This is an important step forward in the study of both the normally functioning motor system and the pathophysiological mechanisms in movement disorders. Hum Brain Mapp, 2007. © 2007 Wiley-Liss, Inc. [source]


PAIRWISE DIFFERENCE ESTIMATION WITH NONPARAMETRIC CONTROL VARIABLES,

INTERNATIONAL ECONOMIC REVIEW, Issue 4 2007
Andres Aradillas-Lopez
This article extends the pairwise difference estimators for various semilinear limited dependent variable models proposed by Honoré and Powell (Identification and Inference in Econometric Models. Essays in Honor of Thomas Rothenberg Cambridge: Cambridge University Press, 2005) to permit the regressor appearing in the nonparametric component to itself depend upon a conditional expectation that is nonparametrically estimated. This permits the estimation approach to be applied to nonlinear models with sample selectivity and/or endogeneity, in which a "control variable" for selectivity or endogeneity is nonparametrically estimated. We develop the relevant asymptotic theory for the proposed estimators and we illustrate the theory to derive the asymptotic distribution of the estimator for the partially linear logit model. [source]


A smooth switching adaptive controller for linearizable systems with improved transient performance

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, Issue 9 2006
Jeng Tze Huang
Abstract The certainty equivalent control has achieved asymptotic tracking stability of linearizable systems in the presence of parametric uncertainty. However, two major drawbacks remain to be tackled, namely, the risk of running into singularity for the calculated control input and the poor transient behaviour arising frequently in a general adaptive system. For the first problem, a high gain control is activated in place of the certainty equivalent control until the risk is bypassed. Among others, it requires less control effort by taking advantages of the bounds for the input vector field. Moreover, the switching mechanism is smooth and hence avoids possible chattering behaviour. Next, to solve the second problem, a new type of update algorithm guaranteeing the exponential stability of the overall closed-loop system, on a weaker persistent excitation (PE) condition, is proposed. In particular, it requires no filtering of the regressor and hence is easier to implement. Simulation results demonstrating the validity of the proposed design are given in the final. Copyright © 2006 John Wiley & Sons, Ltd. [source]


Proteomic analysis of proteins associated with body mass and length in yellow perch, Perca flavescens

PROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 11 2008
John Mark Reddish
Abstract The goal of commercial yellow perch aquaculture is to increase muscle mass which leads to increased profitability. The accumulation and degradation of muscle-specific gene products underlies the variability in body mass (BM) and length observed in pond-cultured yellow perch. Our objective was to apply a combination of statistical and proteomic technologies to identify intact and/or proteolytic fragments of muscle specific gene products involved in muscle growth in yellow perch. Seventy yellow perch randomly selected at 10, 12, 16, 20, and 26,wk of age were euthanized; BM and length were measured and a muscle sample taken. Muscle proteins were resolved using 5,20% gradient SDS-PAGE, stained with SYPRO® Ruby and analyzed using TotalLabÔ software. Data were analyzed using stepwise multiple regression with the dependent variables, BM and length and proportional OD of each band in a sample as a potential regressor. Eight bands associated with BM (R2,=,0.84) and nine bands with length (R2,=,0.85) were detected. Protein sequencing by nano-LC/MS/MS identified 20 proteins/peptides associated with BM and length. These results contribute the identification of gene products and/or proteolytic fragments associated with muscle growth in yellow perch. [source]


Counts with an endogenous binary regressor: A series expansion approach

THE ECONOMETRICS JOURNAL, Issue 1 2005
Andrés Romeu
Summary, We propose an estimator for count data regression models where a binary regressor is endogenously determined. This estimator departs from previous approaches by using a flexible form for the conditional probability function of the counts. Using a Monte Carlo experiment we show that our estimator improves the fit and provides a more reliable estimate of the impact of regressors on the count when compared to alternatives which do restrict the mean to be linear-exponential. In an application to the number of trips by households in the United States, we find that the estimate of the treatment effect obtained is considerably different from the one obtained under a linear-exponential mean specification. [source]


A matrix gradient algorithm for identification of parameterized time-varying parameters

ASIAN JOURNAL OF CONTROL, Issue 1 2009
Min-Shin Chen
Abstract This paper considers the problem of estimating time-varying parameters which can be parameterized by a series of arbitrary known basis functions. It is shown that this problem is equivalent to the observer design problem for a "matrix" dynamic system. A "matrix" gradient algorithm, which mimics the well-known "vector" gradient algorithm, is proposed to estimate the unknown matrix. The contribution of this paper is to show that convergence of the proposed matrix algorithm is guaranteed by the persistent excitations of both the regressor and the basis functions. Copyright © 2009 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society [source]


Identification and Estimation of Regression Models with Misclassification

ECONOMETRICA, Issue 3 2006
Aprajit Mahajan
This paper studies the problem of identification and estimation in nonparametric regression models with a misclassified binary regressor where the measurement error may be correlated with the regressors. We show that the regression function is nonparametrically identified in the presence of an additional random variable that is correlated with the unobserved true underlying variable but unrelated to the measurement error. Identification for semiparametric and parametric regression functions follows straightforwardly from the basic identification result. We propose a kernel estimator based on the identification strategy, derive its large sample properties, and discuss alternative estimation procedures. We also propose a test for misclassification in the model based on an exclusion restriction that is straightforward to implement. [source]


Cross Section and Panel Data Estimators for Nonseparable Models with Endogenous Regressors

ECONOMETRICA, Issue 4 2005
Joseph G. Altonji
We propose two new methods for estimating models with nonseparable errors and endogenous regressors. The first method estimates a local average response. One estimates the response of the conditional mean of the dependent variable to a change in the explanatory variable while conditioning on an external variable and then undoes the conditioning. The second method estimates the nonseparable function and the joint distribution of the observable and unobservable explanatory variables. An external variable is used to impose an equality restriction, at two points of support, on the conditional distribution of the unobservable random term given the regressor and the external variable. Our methods apply to cross sections, but our lead examples involve panel data cases in which the choice of the external variable is guided by the assumption that the distribution of the unobservable variables is exchangeable in the values of the endogenous variable for members of a group. [source]


End-of-Sample Instability Tests

ECONOMETRICA, Issue 6 2003
D. W. K. Andrews
This paper considers tests for structural instability of short duration, such as at the end of the sample. The key feature of the testing problem is that the number, m, of observations in the period of potential change is relatively small,possibly as small as one. The well-known F test of Chow (1960) for this problem only applies in a linear regression model with normally distributed iid errors and strictly exogenous regressors, even when the total number of observations, n+m, is large. We generalize the F test to cover regression models with much more general error processes, regressors that are not strictly exogenous, and estimation by instrumental variables as well as least squares. In addition, we extend the F test to nonlinear models estimated by generalized method of moments and maximum likelihood. Asymptotic critical values that are valid as n,, with m fixed are provided using a subsampling-like method. The results apply quite generally to processes that are strictly stationary and ergodic under the null hypothesis of no structural instability. [source]


A principal components regression approach to multilocus genetic association studies

GENETIC EPIDEMIOLOGY, Issue 2 2008
Kai Wang
Abstract With the rapid development of modern genotyping technology, it is becoming commonplace to genotype densely spaced genetic markers such as single nucleotide polymorphisms (SNPs) along the genome. This development has inspired a strong interest in using multiple markers located in the target region for the detection of association. We introduce a principal components (PCs) regression method for candidate gene association studies where multiple SNPs from the candidate region tend to be correlated. In this approach, the total variance in the original genotype scores is decomposed into parts that correspond to uncorrelated PCs. The PCs with the largest variances are then used as regressors in a multiple regression. Simulation studies suggest that this approach can have higher power than some popular methods. An application to CHI3L2 gene expression data confirms a significant association between CHI3L2 gene expression level and SNPs from this gene that has been previously reported by others. Genet. Epidemiol. 2008. © 2007 Wiley-Liss, Inc. [source]


Characterization of cataclastic shear-zones of the KTB deep drill hole by regression analysis of drill cuttings data

GEOPHYSICAL JOURNAL INTERNATIONAL, Issue 1 2002
Helmuth Winter
Summary During the course of the German continental deep drilling project (KTB) two scientific drill holes were drilled, the KTB Vorbohrung down to 4 km and the KTB Hauptbohrung down to 9.1 km, both intersecting several cataclastic shear-zones. As few drill cores were available in the KTB Hauptbohrung, most of the petrophysical and geochemical data are based on drill cuttings investigations. We present an analysis of drill cuttings data, addressing the question of what relationship between cataclastic shear-zones and petrophysical and geochemical data can be revealed. For that purpose we developed a regression model with the amount of cataclastic rocks in drill cuttings as a dependent variable and the petrophysical and geochemical variables as regressors. We use depth related data from two sections of the KTB Hauptbohrung with cataclastic shear-zones in gneiss (1738,2380 m) and in metabasite (4524,4908 m). The variables are selected by estimating and testing a linear regression model taking into account the autocorrelation of the data due to the depth structure. The variables which characterize the cataclastic shear-zones in gneiss according to our model are the contents of carbon and crystal water and the thermal conductivity, each with positive coefficients. This model explains, in total, 57 per cent of the variance of the observed data. For cataclastic shear-zones in metabasite the content of crystal water and the magnetic susceptibility with positive coefficients and the content of chromium with a negative coefficient are the significant variables. The explained variance in this model is 60 per cent. Being significant in both lithologies, the content of crystal water is an important variable for cataclastic shear-zones. The prediction of shear zones is feasible by our methods, but the results of our study should be confirmed and widened by investigations of other data sets. [source]


A Statistical Estimator of the Spatial Distribution of the Water-Table Altitude

GROUND WATER, Issue 1 2003
Nicasio Sepúlveda
An algorithm was designed to statistically estimate the areal distribution of water-table altitude. The altitude of the water table was bounded below by the minimum water-table surface and above by the land surface. Using lake elevations and stream stages, and interpolating between lakes and streams, the minimum water-table surface was generated. A multiple linear regression among the minimum water-table altitude, the difference between land-surface and minimum water-table altitudes, and the water-level measurements from surficial aquifer system wells resulted in a consistently high correlation for all groups of physiographic regions in Florida. A simple linear regression between land-surface and water-level measurements resulted in a root-mean-square residual of 4.23 m, with residuals ranging from , 8.78 to 41.54 m. A simple linear regression between the minimum water table and the water-level measurements resulted in a root-mean-square residual of 1.45 m, with residuals ranging from ,7.39 to 4.10 m. The application of the multiple linear regression presented herein resulted in a root-mean-square residual of 1.05 m, with residuals ranging from , 5.24 to 5.63 m. Results from complete and partial F tests rejected the hypothesis of eliminating any of the regressors in the multiple linear regression presented in this study. [source]


Factors Affecting Plan Choice and Unmet Need among Supplemental Security Income Eligible Children with Disabilities

HEALTH SERVICES RESEARCH, Issue 5p1 2005
Jean M. Mitchell
Objective. To evaluate factors affecting plan choice (partially capitated managed care [MC] option versus the fee-for-service [FFS] system) and unmet needs for health care services among children who qualified for supplemental security income (SSI) because of a disability. Data Sources. We conducted telephone interviews during the summer and fall of 2002 with a random sample of close to 1,088 caregivers of SSI eligible children who resided in the District of Columbia. Research Design. We employed a two-step procedure where we first estimated plan choice and then constructed a selectivity correction to control for the potential selection bias associated with plan choice. We included the selectivity correction, the dummy variable indicating plan choice and other exogenous regressors in the second stage equations predicting unmet need. The dependent variables in the second stage equations include: (1) having an unmet need for any service or equipment; (2) having an unmet need for physician or hospital services; (3) having an unmet need for medical equipment; (4) having an unmet need for prescription drugs; (5) having an unmet need for dental care. Principal Findings. More disabled children (those with birth defects, chronic conditions, and/or more limitations in activities of daily living) were more likely to enroll in FFS. Children of caregivers with some college education were more likely to opt for FFS, whereas children from higher income households were more prone to enroll in the partially capitated MC plan. Children in FFS were 9.9 percentage points more likely than children enrolled in partially capitated MC to experience an unmet need for any type of health care services (p<.01), while FFS children were 4.5 percentage points more likely than partially capitated MC enrollees to incur a medical equipment unmet need (p<.05). FFS children were also more likely than partially capitated MC enrollees to experience unmet needs for prescription drugs and dental care, however these differences were only marginally significant. Conclusions. We speculate that the case management services available under the MC option, low Medicaid FFS reimbursements and provider availability account for some of the differences in unmet need that exist between partially capitated MC and FFS enrollees. [source]


fMRI analysis for motor paradigms using EMG-based designs: A validation study

HUMAN BRAIN MAPPING, Issue 11 2007
Anne-Fleur van Rootselaar
Abstract The goal of the present validation study is to show that continuous surface EMG recorded simultaneously with 3T fMRI can be used to identify local brain activity related to (1) motor tasks, and to (2) muscle activity independently of a specific motor task, i.e. spontaneous (abnormal) movements. Five healthy participants performed a motor task, consisting of posture (low EMG power), and slow (medium EMG power) and fast (high EMG power) wrist flexion,extension movements. Brain activation maps derived from a conventional block design analysis (block-only design) were compared with brain activation maps derived using EMG-based regressors: (1) using the continuous EMG power as a single regressor of interest (EMG-only design) to relate motor performance and brain activity, and (2) using EMG power variability as an additional regressor in the fMRI block design analysis to relate movement variability and brain activity (mathematically) independent of the motor task. The agreement between the identified brain areas for the block-only design and the EMG-only design was excellent for all participants. Additionally, we showed that EMG power variability correlated well with activity in brain areas known to be involved in movement modulation. These innovative EMG-fMRI analysis techniques will allow the application of novel motor paradigms. This is an important step forward in the study of both the normally functioning motor system and the pathophysiological mechanisms in movement disorders. Hum Brain Mapp, 2007. © 2007 Wiley-Liss, Inc. [source]


Identification of the inertia matrix of a rotating body based on errors-in-variables models

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, Issue 3 2010
Byung-Eul Jun
Abstract This paper proposes a procedure for identifying the inertia matrix of a rotating body. The procedure based on Euler's equation governing rotational motion assumes errors-in-variables models in which all measurements, torque as well as angular velocities, are corrupted by noises. In order for consistent estimation, we introduce an extended linear regression model by augmenting the regressors with constants and the parameters with noise-contributed terms. A transformation, based on low-pass filtering, of the extended model cancels out angular acceleration terms in the regressors. Applying the method of least correlation to the model identifies the elements of the inertia matrix. Analysis shows that the estimates converge to the true parameters as the number of samples increases to infinity. Monte Carlo simulations demonstrate the performance of the algorithm and support the analytical consistency. Copyright © 2009 John Wiley & Sons, Ltd. [source]


A semiparametric model for binary response and continuous outcomes under index heteroscedasticity

JOURNAL OF APPLIED ECONOMETRICS, Issue 5 2009
Roger Klein
This paper formulates a likelihood-based estimator for a double-index, semiparametric binary response equation. A novel feature of this estimator is that it is based on density estimation under local smoothing. While the proofs differ from those based on alternative density estimators, the finite sample performance of the estimator is significantly improved. As binary responses often appear as endogenous regressors in continuous outcome equations, we also develop an optimal instrumental variables estimator in this context. For this purpose, we specialize the double-index model for binary response to one with heteroscedasticity that depends on an index different from that underlying the ,mean response'. We show that such (multiplicative) heteroscedasticity, whose form is not parametrically specified, effectively induces exclusion restrictions on the outcomes equation. The estimator developed exploits such identifying information. We provide simulation evidence on the favorable performance of the estimators and illustrate their use through an empirical application on the determinants, and affect, of attendance at a government-financed school. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Jointness of growth determinants

JOURNAL OF APPLIED ECONOMETRICS, Issue 2 2009
Gernot Doppelhofer
This paper introduces a new measure of dependence or jointness among explanatory variables. Jointness is based on the joint posterior distribution of variables over the model space, thereby taking model uncertainty into account. By looking beyond marginal measures of variable importance, jointness reveals generally unknown forms of dependence. Positive jointness implies that regressors are complements, representing distinct but mutually reinforcing effects. Negative jointness implies that explanatory variables are substitutes and capture similar underlying effects. In a cross-country dataset we show that jointness among 67 determinants of growth is important, affecting inference and informing economic policy. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Unemployment and liquidity constraints

JOURNAL OF APPLIED ECONOMETRICS, Issue 3 2007
Vassilis A. Hajivassiliou
We present a dynamic framework for the interaction between borrowing (liquidity) constraints and deviations of actual hours from desired hours, both measured by discrete-valued indicators, and estimate it as a system of dynamic binary and ordered probit models with panel data from the Panel Study of Income Dynamics. We analyze a household's propensity to be liquidity constrained by means of a dynamic binary probit model. We analyze qualitative aspects of the conditions of employment, namely whether the household head is involuntarily overemployed, voluntarily employed, or involuntarily underemployed or unemployed, by means of a dynamic ordered probit model. We focus on the possible interaction between the two types of constraints. We estimate these models jointly using maximum simulated likelihood, where we allow for individual random effects along with an autoregressive process for the general error term in each equation. A novel feature of our method is that it allows for the random effects to be correlated with regressors in a time-invariant fashion. Our results provide strong support for the basic theory of constrained behavior and the interaction between liquidity constraints and exogenous constraints on labor supply. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Model uncertainty in cross-country growth regressions

JOURNAL OF APPLIED ECONOMETRICS, Issue 5 2001
Carmen Fernández
We investigate the issue of model uncertainty in cross-country growth regressions using Bayesian Model Averaging (BMA). We find that the posterior probability is spread widely among many models, suggesting the superiority of BMA over choosing any single model. Out-of-sample predictive results support this claim. In contrast to Levine and Renelt (1992), our results broadly support the more ,optimistic' conclusion of Sala-i-Martin (1997b), namely that some variables are important regressors for explaining cross-country growth patterns. However, care should be taken in the methodology employed. The approach proposed here is firmly grounded in statistical theory and immediately leads to posterior and predictive inference. Copyright © 2001 John Wiley & Sons, Ltd. [source]


Robust methods for partial least squares regression

JOURNAL OF CHEMOMETRICS, Issue 10 2003
M. Hubert
Abstract Partial least squares regression (PLSR) is a linear regression technique developed to deal with high-dimensional regressors and one or several response variables. In this paper we introduce robustified versions of the SIMPLS algorithm, this being the leading PLSR algorithm because of its speed and efficiency. Because SIMPLS is based on the empirical cross-covariance matrix between the response variables and the regressors and on linear least squares regression, the results are affected by abnormal observations in the data set. Two robust methods, RSIMCD and RSIMPLS, are constructed from a robust covariance matrix for high-dimensional data and robust linear regression. We introduce robust RMSECV and RMSEP values for model calibration and model validation. Diagnostic plots are constructed to visualize and classify the outliers. Several simulation results and the analysis of real data sets show the effectiveness and robustness of the new approaches. Because RSIMPLS is roughly twice as fast as RSIMCD, it stands out as the overall best method. Copyright © 2003 John Wiley & Sons, Ltd. [source]


A robust PCR method for high-dimensional regressors

JOURNAL OF CHEMOMETRICS, Issue 8-9 2003
Mia Hubert
Abstract We consider the multivariate calibration model which assumes that the concentrations of several constituents of a sample are linearly related to its spectrum. Principal component regression (PCR) is widely used for the estimation of the regression parameters in this model. In the classical approach it combines principal component analysis (PCA) on the regressors with least squares regression. However, both stages yield very unreliable results when the data set contains outlying observations. We present a robust PCR (RPCR) method which also consists of two parts. First we apply a robust PCA method for high-dimensional data on the regressors, then we regress the response variables on the scores using a robust regression method. A robust RMSECV value and a robust R2 value are proposed as exploratory tools to select the number of principal components. The prediction error is also estimated in a robust way. Moreover, we introduce several diagnostic plots which are helpful to visualize and classify the outliers. The robustness of RPCR is demonstrated through simulations and the analysis of a real data set. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Forecast covariances in the linear multiregression dynamic model

JOURNAL OF FORECASTING, Issue 2 2008
Catriona M. Queen
Abstract The linear multiregression dynamic model (LMDM) is a Bayesian dynamic model which preserves any conditional independence and causal structure across a multivariate time series. The conditional independence structure is used to model the multivariate series by separate (conditional) univariate dynamic linear models, where each series has contemporaneous variables as regressors in its model. Calculating the forecast covariance matrix (which is required for calculating forecast variances in the LMDM) is not always straightforward in its current formulation. In this paper we introduce a simple algebraic form for calculating LMDM forecast covariances. Calculation of the covariance between model regression components can also be useful and we shall present a simple algebraic method for calculating these component covariances. In the LMDM formulation, certain pairs of series are constrained to have zero forecast covariance. We shall also introduce a possible method to relax this restriction. Copyright © 2008 John Wiley & Sons, Ltd. [source]


An improved independent component regression modeling and quantitative calibration procedure

AICHE JOURNAL, Issue 6 2010
Chunhui Zhao
Abstract An improved independent component regression (M-ICR) algorithm is proposed by constructing joint latent variable (LV) based regressors, and a quantitative statistical analysis procedure is designed using a bootstrap technique for model validation and performance evaluation. First, the drawbacks of the conventional regression modeling algorithms are analyzed. Then the proposed M-ICR algorithm is formulated for regressor design. It constructs a dual-objective optimization criterion function, simultaneously incorporating quality-relevance and independence into the feature extraction procedure. This ties together the ideas of partial-least squares (PLS), and independent component regression (ICR) under the same mathematical umbrella. By adjusting the controllable suboptimization objective weights, it adds insight into the different roles of quality-relevant and independent characteristics in calibration modeling, and, thus, provides possibilities to combine the advantages of PLS and ICR. Furthermore, a quantitative statistical analysis procedure based on a bootstrapping technique is designed to identify the effects of LVs, determine a better model rank and overcome ill-conditioning caused by model over-parameterization. A confidence interval on quality prediction is also approximated. The performance of the proposed method is demonstrated using both numerical and real world data. © 2009 American Institute of Chemical Engineers AIChE J, 2010 [source]


Bayesian modelling of catch in a north-west Atlantic fishery

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 3 2002
Carmen Fernández
Summary. We model daily catches of fishing boats in the Grand Bank fishing grounds. We use data on catches per species for a number of vessels collected by the European Union in the context of the Northwest Atlantic Fisheries Organization. Many variables can be thought to influence the amount caught: a number of ship characteristics (such as the size of the ship, the fishing technique used and the mesh size of the nets) are obvious candidates, but one can also consider the season or the actual location of the catch. Our database leads to 28 possible regressors (arising from six continuous variables and four categorical variables, whose 22 levels are treated separately), resulting in a set of 177 million possible linear regression models for the log-catch. Zero observations are modelled separately through a probit model. Inference is based on Bayesian model averaging, using a Markov chain Monte Carlo approach. Particular attention is paid to the prediction of catches for single and aggregated ships. [source]


Genetic evaluation of dairy cattle using a simple heritable genetic ground

JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE, Issue 11 2010
Josef Pribyl
Abstract The evaluation of an animal is based on production records, adjusted for environmental effects, which gives a reliable estimation of its breeding value. Highly reliable daughter yield deviations are used as inputs for genetic marker evaluation. Genetic variability is explained by particular loci and background polygenes, both of which are described by the genomic breeding value selection index. Automated genotyping enables the determination of many single-nucleotide polymorphisms (SNPs) and can increase the reliability of evaluation of young animals (from 0.30 if only the pedigree value is used to 0.60 when the genomic breeding value is applied). However, the introduction of SNPs requires a mixed model with a large number of regressors, in turn requiring new algorithms for the best linear unbiased prediction and BayesB. Here, we discuss a method that uses a genomic relationship matrix to estimate the genomic breeding value of animals directly, without regressors. A one-step procedure evaluates both genotyped and ungenotyped animals at the same time, and produces one common ranking of all animals in a whole population. An augmented pedigree,genomic relationship matrix and the removal of prerequisites produce more accurate evaluations of all connected animals. Copyright © 2010 Society of Chemical Industry [source]


Cointegrating regressions with messy regressors and an application to mixed-frequency series

JOURNAL OF TIME SERIES ANALYSIS, Issue 4 2010
J. Isaac Miller
C13; C14; C32 We consider a cointegrating regression in which the integrated regressors are messy in the sense that they contain data that may be mismeasured, missing, observed at mixed frequencies or have other irregularities that cause the econometrician to observe them with mildly nonstationary noise. Least squares estimation of the cointegrating vector is consistent. Existing prototypical variance-based estimation techniques, such as canonical cointegrating regression, are both consistent and asymptotically mixed normal. This result is robust to weakly dependent but possibly nonstationary disturbances. [source]