Regression Problems (regression + problem)

Distribution by Scientific Domains


Selected Abstracts


Support vector machines-based generalized predictive control

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, Issue 17 2006
S. Iplikci
Abstract In this study, we propose a novel control methodology that introduces the use of support vector machines (SVMs) in the generalized predictive control (GPC) scheme. The SVM regression algorithms have extensively been used for modelling nonlinear systems due to their assurance of global solution, which is achieved by transforming the regression problem into a convex optimization problem in dual space, and also their higher generalization potential. These key features of the SVM structures lead us to the idea of employing a SVM model of an unknown plant within the GPC context. In particular, the SVM model can be employed to obtain gradient information and also it can predict future trajectory of the plant output, which are needed in the cost function minimization block. Simulations have confirmed that proposed SVM-based GPC scheme can provide a noticeably high control performance, in other words, an unknown nonlinear plant controlled by SVM-based GPC can accurately track the reference inputs with different shapes. Moreover, the proposed SVM-based GPC scheme maintains its control performance under noisy conditions. Copyright © 2006 John Wiley & Sons, Ltd. [source]


A Bayesian regression approach to terrain mapping and an application to legged robot locomotion

JOURNAL OF FIELD ROBOTICS (FORMERLY JOURNAL OF ROBOTIC SYSTEMS), Issue 10 2009
Christian Plagemann
We deal with the problem of learning probabilistic models of terrain surfaces from sparse and noisy elevation measurements. The key idea is to formalize this as a regression problem and to derive a solution based on nonstationary Gaussian processes. We describe how to achieve a sparse approximation of the model, which makes the model applicable to real-world data sets. The main benefits of our model are that (1) it does not require a discretization of space, (2) it also provides the uncertainty for its predictions, and (3) it adapts its covariance function to the observed data, allowing more accurate inference of terrain elevation at points that have not been observed directly. As a second contribution, we describe how a legged robot equipped with a laser range finder can utilize the developed terrain model to plan and execute a path over rough terrain. We show how a motion planner can use the learned terrain model to plan a path to a goal location, using a terrain-specific cost model to accept or reject candidate footholds. To the best of our knowledge, this was the first legged robotics system to autonomously sense, plan, and traverse a terrain surface of the given complexity. © 2009 Wiley Periodicals, Inc. [source]


Estimating the long memory granger causality effect with a spectrum estimator

JOURNAL OF FORECASTING, Issue 3 2006
Wen-Den ChenArticle first published online: 11 APR 200
Abstract This paper discusses the Granger causality test by a spectrum estimator which allows the transfer function to have long memory properties. In traditional methodology the relationship among variables is usually assumed to be short memory or contemporaneous. Hence, we have to make sure they are of the same integrated order, else there might be a spurious regression problem. In practice, not all the variables are fractionally co-integrated in the economic model. They may have the same random resources, but under a different integrated order. This paper focuses on how to capture the long memory Granger causality effect in the transfer function. This does not necessarily assume the variables are of the same fractional integrated order. Moreover, by the transfer function we construct an estimator to test the long memory effect with the Granger causality sense. Copyright © 2006 John Wiley & Sons, Ltd. [source]


A mathematical programming approach for improving the robustness of least sum of absolute deviations regression

NAVAL RESEARCH LOGISTICS: AN INTERNATIONAL JOURNAL, Issue 4 2006
Avi Giloni
Abstract This paper discusses a novel application of mathematical programming techniques to a regression problem. While least squares regression techniques have been used for a long time, it is known that their robustness properties are not desirable. Specifically, the estimators are known to be too sensitive to data contamination. In this paper we examine regressions based on Least-sum of Absolute Deviations (LAD) and show that the robustness of the estimator can be improved significantly through a judicious choice of weights. The problem of finding optimum weights is formulated as a nonlinear mixed integer program, which is too difficult to solve exactly in general. We demonstrate that our problem is equivalent to a mathematical program with a single functional constraint resembling the knapsack problem and then solve it for a special case. We then generalize this solution to general regression designs. Furthermore, we provide an efficient algorithm to solve the general nonlinear, mixed integer programming problem when the number of predictors is small. We show the efficacy of the weighted LAD estimator using numerical examples. © 2006 Wiley Periodicals, Inc. Naval Research Logistics, 2006 [source]


Bootstrapping regression models with BLUS residuals

THE CANADIAN JOURNAL OF STATISTICS, Issue 1 2000
Michèle Grenier
Abstract To bootstrap a regression problem, pairs of response and explanatory variables or residuals can be resam-pled, according to whether we believe that the explanatory variables are random or fixed. In the latter case, different residuals have been proposed in the literature, including the ordinary residuals (Efron 1979), standardized residuals (Bickel & Freedman 1983) and Studentized residuals (Weber 1984). Freedman (1981) has shown that the bootstrap from ordinary residuals is asymptotically valid when the number of cases increases and the number of variables is fixed. Bickel & Freedman (1983) have shown the asymptotic validity for ordinary residuals when the number of variables and the number of cases both increase, provided that the ratio of the two converges to zero at an appropriate rate. In this paper, the authors introduce the use of BLUS (Best Linear Unbiased with Scalar covariance matrix) residuals in bootstrapping regression models. The main advantage of the BLUS residuals, introduced in Theil (1965), is that they are uncorrelated. The main disadvantage is that only n ,p residuals can be computed for a regression problem with n cases and p variables. The asymptotic results of Freedman (1981) and Bickel & Freedman (1983) for the ordinary (and standardized) residuals are generalized to the BLUS residuals. A small simulation study shows that even though only n , p residuals are available, in small samples bootstrapping BLUS residuals can be as good as, and sometimes better than, bootstrapping from standardized or Studentized residuals. Pour appliquer le bootstrap en régression, on peut soit rééchantillonner conjointement les variables réponse et explicatives ou encore rééchantillonner des résidus selon que l'on pense que les variables explicatives sont aléatoires ou fixes. Dans ce dernier cas, plusieurs types de résidus ont été présentés dans la littérature, notamment les résidus ordinaires (Efron 1979), les résidus standardisés (Bickel & Freedman 1983) et les résidus studentisés (Weber 1984). Freedman (1981) a démontré que l'utilisation des résidus ordinaires dans le bootstrap en régression est asymptotiquement valide lorsque le nombre d'unités d'observation augmente alors que le nombre de variables explicatives est fixe. Bickel & Freedman (1983) ont démontré la même validité asymptotique lorsque le nombre de variables explicatives augmente en même temps que le nombre d'unités d'observation, en autant que le rapport des deux converge vers zéro à un taux approprié. Dans cet article, les auteurs considèrent l'emploi des résidus BLUS (Best Linear Unbiased with Scalar covariance matrix) dans le bootstrap en régression. Le principal avantage des résidus BLUS, définis par Theil (1965), est qu'ils sont non corrélés. Toutefois, on ne peut calculer que n,p résidus dans un problème de régression avec n unités d'observation et p variables explicatives. Les résultats asymptotiques de Freedman (1981) et Bickel & Freedman (1983) pour les résidus ordinaires (et standardisés) sont généralisés aux résidus BLUS. Une simulation démontre que bien que l'on ne puisse alors compter que sur n , p résidus, le bootstrap effectué à partir des residus BLUS fait aussi bien et parfois mieux dans les petits échantillons que le bootstrap s'appuyant sur les résidus standardisés ou studentisés. [source]


Space varying coefficient models for small area data

ENVIRONMETRICS, Issue 5 2003
Renato M. Assunção
Abstract Many spatial regression problems using area data require more flexible forms than the usual linear predictor for modelling the dependence of responses on covariates. One direction for doing this is to allow the coefficients to vary as smooth functions of the area's geographical location. After presenting examples from the scientific literature where these spatially varying coefficients are justified, we briefly review some of the available alternatives for this kind of modelling. We concentrate on a Bayesian approach for generalized linear models proposed by the author which uses a Markov random field to model the coefficients' spatial dependency. We show that, for normally distributed data, Gibbs sampling can be used to sample from the posterior and we prove a result showing the equivalence between our model and other usual spatial regression models. We illustrate our approach with a number of rather complex applied problems, showing that the method is computationally feasible and provides useful insights in substantive problems. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Support vector regression to predict asphalt mix performance

INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, Issue 16 2008
Maher Maalouf
Abstract Material properties are essential in the design and evaluation of pavements. In this paper, the potential of support vector regression (SVR) algorithm is explored to predict the resilient modulus (MR), which is an essential property in designing and evaluating pavement materials, particularly hot mix asphalt typically used in Oklahoma. SVR is a statistical learning algorithm that is applied to regression problems; in our study, SVR was shown to be superior to the least squares (LS). Compared with the widely used LS method, the results of this study show that SVR significantly reduces the mean-squared error and improves the correlation coefficient. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Effective database processing for classification and regression with continuous variables

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 12 2007
E. Di Tomaso
This article proposes a method for manipulating a database of instances relative to discrete and continuous variables. A fuzzy partition is used to discretize continuous domains. A reorganized form of representing a relational database is proposed. The new form of representation is called an effective database. The effective database is tested on classification and regression problems using general Bayesian networks and Näive Bayes classifiers. The structures and the parameters of the classifiers are estimated from the effective database. An algorithm for updating with soft evidence is used to test the induced models, when continuous variables are present. The experiments show that the effective database procedure produces a selection of relevant information from data, which improves in some cases the prediction accuracy of the classifiers. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 1271,1285, 2007. [source]


Assessment of four modifications of a novel indexing technique for case-based reasoning

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 4 2007
Mykola Galushka
In this article, we investigate four variations (D-HSM, D-HSW, D-HSE, and D-HSEW) of a novel indexing technique called D-HS designed for use in case-based reasoning (CBR) systems. All D-HS modifications are based on a matrix of cases indexed by their discretized attribute values. The main differences between them are in their attribute discretization stratagem and similarity determination metric. D-HSM uses a fixed number of intervals and simple intersection as a similarity metric; D-HSW uses the same discretization approach and a weighted intersection; D-HSE uses information gain to define the intervals and simple intersection as similarity metric; D-HSEW is a combination of D-HSE and D-HSW. Benefits of using D-HS include ease of case and similarity knowledge maintenance, simplicity, accuracy, and speed in comparison to conventional approaches widely used in CBR. We present results from the analysis of 20 case bases for classification problems and 15 case bases for regression problems. We demonstrate the improvements in accuracy and/or efficiency of each D-HS modification in comparison to traditional k -NN, R-tree, C4,5, and M5 techniques and show it to be a very attractive approach for indexing case bases. We also illuminate potential areas for further improvement of the D-HS approach. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 353,383, 2007. [source]


Partial least squares for discrimination

JOURNAL OF CHEMOMETRICS, Issue 3 2003
Matthew Barker
Abstract Partial least squares (PLS) was not originally designed as a tool for statistical discrimination. In spite of this, applied scientists routinely use PLS for classification and there is substantial empirical evidence to suggest that it performs well in that role. The interesting question is: why can a procedure that is principally designed for overdetermined regression problems locate and emphasize group structure? Using PLS in this manner has heurestic support owing to the relationship between PLS and canonical correlation analysis (CCA) and the relationship, in turn, between CCA and linear discriminant analysis (LDA). This paper replaces the heuristics with a formal statistical explanation. As a consequence, it will become clear that PLS is to be preferred over PCA when discrimination is the goal and dimension reduction is needed. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Nonlinear experimental design using Bayesian regularized neural networks

AICHE JOURNAL, Issue 6 2007
Matthew C Coleman
Abstract Novel criteria for designing experiments for nonlinear processes are presented. These criteria improve on a previous methodology in that they can be used to suggest a batch of new experiments to perform (as opposed to a single new experiment) and are also optimized for discovering improved optima of the system response. This is accomplished by using information theoretic criterion, which also heuristically penalize experiments that are likely to result in low (nonoptimal) results. While the methods may be applied to any type of nonlinear-nonparametric model (radial basis functions and generalized linear regression), they are here exclusively considered in conjunction with Bayesian regularized feedforward neural networks. A focus on the application of rapid process development, and how to use repeated experiments to optimize the training procedures of Bayesian regularized neural networks is shown. The presented methods are applied to three case studies. The first two case studies involve simulations of one and two-dimensional (2-D) nonlinear regression problems. The third case study involves real historical data from bench-scale fermentations generated in our laboratory. It is shown that using the presented criteria to design new experiments can greatly increase a feedforward neural network's ability to predict global optima. © 2007 American Institute of Chemical Engineers AIChE J, 2007 [source]


Adjustment for Missingness Using Auxiliary Information in Semiparametric Regression

BIOMETRICS, Issue 1 2010
Donglin Zeng
Summary In this article, we study the estimation of mean response and regression coefficient in semiparametric regression problems when response variable is subject to nonrandom missingness. When the missingness is independent of the response conditional on high-dimensional auxiliary information, the parametric approach may misspecify the relationship between covariates and response while the nonparametric approach is infeasible because of the curse of dimensionality. To overcome this, we study a model-based approach to condense the auxiliary information and estimate the parameters of interest nonparametrically on the condensed covariate space. Our estimators possess the double robustness property, i.e., they are consistent whenever the model for the response given auxiliary covariates or the model for the missingness given auxiliary covariate is correct. We conduct a number of simulations to compare the numerical performance between our estimators and other existing estimators in the current missing data literature, including the propensity score approach and the inverse probability weighted estimating equation. A set of real data is used to illustrate our approach. [source]


Roman Period fetal skeletons from the East Cemetery (Kellis 2) of Kellis, Egypt

INTERNATIONAL JOURNAL OF OSTEOARCHAEOLOGY, Issue 5 2005
M. W. Tocheri
Abstract Much can be learned about the religious ideology and mortuary patterns as well as the demographic and health profiles of a population from archaeological human fetal skeletons. Fetal skeletons are rare, however, largely due to poor preservation and recovery, misidentification, or non-inclusion in general burial populations. We present an analysis of 82 fetal/perinatal skeletons recovered from Kellis 2, a Roman Period cemetery dated to the third and fourth centuries AD, located in the Dakhleh Oasis, Egypt. Most of the fetal remains were individually wrapped in linen and all were buried among the general cemetery population in a supine, east,west orientation with the head facing to the west. Gestational age estimates are calculated from diaphysis lengths using published regression and Bayesian methods. The overall similarity between the fetal age distributions calculated from the regression and Bayesian methods suggests that the correlation between diaphysis length and gestational age is typically strong enough to avoid the ,regression' problem of having the age structure of reference samples adversely affecting the age distribution of target samples. The inherent bias of the regression methods, however, is primarily reflected in the gestational age categories between 36 and 42 weeks corresponding with the expected increase in growth variation during the late third trimester. The results suggest that the fetal age distribution at Kellis 2 does not differ from the natural expected mortality distribution. Therefore, practices such as infanticide can be ruled out as having a significant effect on the observed mortality distribution. Moreover, the Kellis 2 sample is well represented in each gestational age category, suggesting that all premature stillbirths and neonatal deaths received similar burial rites. The age distribution of the Kellis 2 fetal remains suggests that emerging Christian concepts, such as the ,soul' and the ,afterlife', were being applied to everyone including fetuses of all gestational ages. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Tourism demand modelling: some issues regarding unit roots, co-integration and diagnostic tests

INTERNATIONAL JOURNAL OF TOURISM RESEARCH, Issue 5 2003
Paresh Kumar Narayan
Abstract This paper investigates the all important issue of diagnostic tests, including unit roots and cointegration, in the tourism demand modelling literature. The origins of this study lie in the apparent lack in the tourism economics literature of detail concerning the diagnostic test aspect. Study of this deficiency has suggested that previous literature on tourism demand modelling may be divided into two categories: the pre-1995 and post-1995 studies. It was found that the pre-1995 and some post-1995 studies have ignored unit root tests and co-integration and, hence, are vulnerable to the so-called ,spurious regression' problem. In highlighting the key diagnostic tests reported by post-1995 studies, this paper contends that there is no need to report the autoregressive conditional heteroskedasticity (ARCH) test, which is applicable only to financial market analysis where the dependent variable is return on an asset. More generally, heteroskedasticity is not seen as a problem in time-series data. However, the reporting of a greater than necessary range of diagnostic tests,,,some of which do not have any theoretical justification with regard to tourism demand analysis,,,does not diminish the precision of the results or the model. This paper should appeal to scholars involved in tourism demand modelling. Copyright © 2003 John Wiley & Sons, Ltd. [source]