Mean-squared Error (mean-squared + error)

Distribution by Scientific Domains


Selected Abstracts


RECOGNIZING STRONG AND WEAK OPINION CLAUSES

COMPUTATIONAL INTELLIGENCE, Issue 2 2006
Theresa Wilson
There has been a recent swell of interest in the automatic identification and extraction of opinions and emotions in text. In this paper, we present the first experimental results classifying the intensity of opinions and other types of subjectivity and classifying the subjectivity of deeply nested clauses. We use a wide range of features, including new syntactic features developed for opinion recognition. We vary the learning algorithm and the feature organization to explore the effect this has on the classification task. In 10-fold cross-validation experiments using support vector regression, we achieve improvements in mean-squared error over baseline ranging from 49% to 51%. Using boosting, we achieve improvements in accuracy ranging from 23% to 96%. [source]


Support vector regression to predict asphalt mix performance

INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, Issue 16 2008
Maher Maalouf
Abstract Material properties are essential in the design and evaluation of pavements. In this paper, the potential of support vector regression (SVR) algorithm is explored to predict the resilient modulus (MR), which is an essential property in designing and evaluating pavement materials, particularly hot mix asphalt typically used in Oklahoma. SVR is a statistical learning algorithm that is applied to regression problems; in our study, SVR was shown to be superior to the least squares (LS). Compared with the widely used LS method, the results of this study show that SVR significantly reduces the mean-squared error and improves the correlation coefficient. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Model choice in time series studies of air pollution and mortality

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 2 2006
Roger D. Peng
Summary., Multicity time series studies of particulate matter and mortality and morbidity have provided evidence that daily variation in air pollution levels is associated with daily variation in mortality counts. These findings served as key epidemiological evidence for the recent review of the US national ambient air quality standards for particulate matter. As a result, methodological issues concerning time series analysis of the relationship between air pollution and health have attracted the attention of the scientific community and critics have raised concerns about the adequacy of current model formulations. Time series data on pollution and mortality are generally analysed by using log-linear, Poisson regression models for overdispersed counts with the daily number of deaths as outcome, the (possibly lagged) daily level of pollution as a linear predictor and smooth functions of weather variables and calendar time used to adjust for time-varying confounders. Investigators around the world have used different approaches to adjust for confounding, making it difficult to compare results across studies. To date, the statistical properties of these different approaches have not been comprehensively compared. To address these issues, we quantify and characterize model uncertainty and model choice in adjusting for seasonal and long-term trends in time series models of air pollution and mortality. First, we conduct a simulation study to compare and describe the properties of statistical methods that are commonly used for confounding adjustment. We generate data under several confounding scenarios and systematically compare the performance of the various methods with respect to the mean-squared error of the estimated air pollution coefficient. We find that the bias in the estimates generally decreases with more aggressive smoothing and that model selection methods which optimize prediction may not be suitable for obtaining an estimate with small bias. Second, we apply and compare the modelling approaches with the National Morbidity, Mortality, and Air Pollution Study database which comprises daily time series of several pollutants, weather variables and mortality counts covering the period 1987,2000 for the largest 100 cities in the USA. When applying these approaches to adjusting for seasonal and long-term trends we find that the Study's estimates for the national average effect of PM10 at lag 1 on mortality vary over approximately a twofold range, with 95% posterior intervals always excluding zero risk. [source]


On variable bandwidth selection in local polynomial regression

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 3 2000
Kjell Doksum
The performances of data-driven bandwidth selection procedures in local polynomial regression are investigated by using asymptotic methods and simulation. The bandwidth selection procedures considered are based on minimizing ,prelimit' approximations to the (conditional) mean-squared error (MSE) when the MSE is considered as a function of the bandwidth h. We first consider approximations to the MSE that are based on Taylor expansions around h=0 of the bias part of the MSE. These approximations lead to estimators of the MSE that are accurate only for small bandwidths h. We also consider a bias estimator which instead of using small h approximations to bias naïvely estimates bias as the difference of two local polynomial estimators of different order and we show that this estimator performs well only for moderate to large h. We next define a hybrid bias estimator which equals the Taylor-expansion-based estimator for small h and the difference estimator for moderate to large h. We find that the MSE estimator based on this hybrid bias estimator leads to a bandwidth selection procedure with good asymptotic and, for our Monte Carlo examples, finite sample properties. [source]


Bootstrap-based bandwidth choice for log-periodogram regression

JOURNAL OF TIME SERIES ANALYSIS, Issue 6 2009
Josu Arteche
Abstract., The choice of the bandwidth in the local log-periodogram regression is of crucial importance for estimation of the memory parameter of a long memory time series. Different choices may give rise to completely different estimates, which may lead to contradictory conclusions, for example about the stationarity of the series. We propose here a data-driven bandwidth selection strategy that is based on minimizing a bootstrap approximation of the mean-squared error (MSE). Its behaviour is compared with other existing techniques for optimal bandwidth selection in a MSE sense, revealing its better performance in a wider class of models. The empirical applicability of the proposed strategy is shown with two examples: the widely analysed in a long memory context Nile river annual minimum levels and the input gas rate series of Box and Jenkins. [source]


Bayesian analysis for weighted mean-squared error in dual response surface optimization

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, Issue 5 2010
In-Jun Jeong
Abstract Dual response surface optimization considers the mean and the variation simultaneously. The minimization of mean-squared error (MSE) is an effective approach in dual response surface optimization. Weighted MSE (WMSE) is formed by imposing the relative weights, (,, 1,,), on the squared bias and variance components of MSE. To date, a few methods have been proposed for determining ,. The resulting , from these methods is either a single value or an interval. This paper aims at developing a systematic method to choose a , value when an interval of , is given. Specifically, this paper proposes a Bayesian approach to construct a probability distribution of ,. Once the distribution of , is constructed, the expected value of , can be used to form WMSE. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Designing an accelerated degradation experiment by optimizing the estimation of the percentile

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, Issue 3 2003
Hong-Fwu Yu
Abstract Degradation tests are widely used to assess the reliability of highly reliable products which are not likely to fail under traditional life tests or accelerated life tests. However, for some highly reliable products, the degradation may be very slow and hence it is impossible to have a precise assessment within a reasonable amount of testing time. In such cases, an alternative is to use higher stresses to extrapolate the product's reliability at the design stress. This is called an accelerated degradation test (ADT). In conducting an ADT, several decision variables, such s the inspection frequency, sample size and termination time, at each stress level are influential on the experimental efficiency. An inappropriate choice of these decision variables not only wastes experimental resources but also reduces the precision of the estimation of the product's reliability at the use condition. The main purpose of this paper is to deal with the problem of designing an ADT. By using the criterion of minimizing the mean-squared error of the estimated 100th percentile of the product's lifetime distribution at the use condition subject to the constraint that the total experimental cost does not exceed a predetermined budget, a nonlinear integer programming problem is built to derive the optimal combination of the sample size, inspection frequency and the termination time at each stress level. A numerical example is provided to illustrate the proposed method. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Robust designs for misspecified exponential regression models

APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, Issue 2 2009
Xiaojian Xu
Abstract We consider the construction of designs for exponential regression. The response function is an only approximately known function of a specified exponential function. As well, we allow for variance heterogeneity. We find minimax designs and corresponding optimal regression weights in the context of the following problems: (1) for nonlinear least-squares (LS) estimation with homoscedasticity, determine a design to minimize the maximum value of the integrated mean-squared error (IMSE), with the maximum being evaluated for the possible departures from the response function; (2) for nonlinear LS estimation with heteroscedasticity, determine a design to minimize the maximum value of IMSE, with the maximum being evaluated over both types of departures; (3) for nonlinear weighted LS estimation, determine both weights and a design to minimize the maximum IMSE; and (4) choose weights and design points to minimize the maximum IMSE, subject to a side condition of unbiasedness. Solutions to (1),(4) are given in complete generality. Copyright © 2009 John Wiley & Sons, Ltd. [source]


An Empirical Bayes Method for Estimating Epistatic Effects of Quantitative Trait Loci

BIOMETRICS, Issue 2 2007
Shizhong Xu
Summary The genetic variance of a quantitative trait is often controlled by the segregation of multiple interacting loci. Linear model regression analysis is usually applied to estimating and testing effects of these quantitative trait loci (QTL). Including all the main effects and the effects of interaction (epistatic effects), the dimension of the linear model can be extremely high. Variable selection via stepwise regression or stochastic search variable selection (SSVS) is the common procedure for epistatic effect QTL analysis. These methods are computationally intensive, yet they may not be optimal. The LASSO (least absolute shrinkage and selection operator) method is computationally more efficient than the above methods. As a result, it has been widely used in regression analysis for large models. However, LASSO has never been applied to genetic mapping for epistatic QTL, where the number of model effects is typically many times larger than the sample size. In this study, we developed an empirical Bayes method (E-BAYES) to map epistatic QTL under the mixed model framework. We also tested the feasibility of using LASSO to estimate epistatic effects, examined the fully Bayesian SSVS, and reevaluated the penalized likelihood (PENAL) methods in mapping epistatic QTL. Simulation studies showed that all the above methods performed satisfactorily well. However, E-BAYES appears to outperform all other methods in terms of minimizing the mean-squared error (MSE) with relatively short computing time. Application of the new method to real data was demonstrated using a barley dataset. [source]