Data Models (data + models)

Distribution by Scientific Domains
Distribution within Business, Economics, Finance and Accounting

Kinds of Data Models

  • count data models
  • panel data models


  • Selected Abstracts


    Semiparametric Bayesian Inference in Autoregressive Panel Data Models

    ECONOMETRICA, Issue 2 2002
    Keisuke Hirano
    First page of article [source]


    A Bayesian Chi-Squared Goodness-of-Fit Test for Censored Data Models

    BIOMETRICS, Issue 2 2010
    Jing Cao
    Summary We propose a Bayesian chi-squared model diagnostic for analysis of data subject to censoring. The test statistic has the form of Pearson's chi-squared test statistic and is easy to calculate from standard output of Markov chain Monte Carlo algorithms. The key innovation of this diagnostic is that it is based only on observed failure times. Because it does not rely on the imputation of failure times for observations that have been censored, we show that under heavy censoring it can have higher power for detecting model departures than a comparable test based on the complete data. In a simulation study, we show that tests based on this diagnostic exhibit comparable power and better nominal Type I error rates than a commonly used alternative test proposed by Akritas (1988,,Journal of the American Statistical Association,83, 222,230). An important advantage of the proposed diagnostic is that it can be applied to a broad class of censored data models, including generalized linear models and other models with nonidentically distributed and nonadditive error structures. We illustrate the proposed model diagnostic for testing the adequacy of two parametric survival models for Space Shuttle main engine failures. [source]


    Interactive Visualization with Programmable Graphics Hardware

    COMPUTER GRAPHICS FORUM, Issue 3 2002
    Thomas Ertl
    One of the main scientific goals of visualization is the development of algorithms and appropriate data models which facilitate interactive visual analysis and direct manipulation of the increasingly large data sets which result from simulations running on massive parallel computer systems, from measurements employing fast high-resolution sensors, or from large databases and hierarchical information spaces. This task can only be achieved with the optimization of all stages of the visualization pipeline: filtering, compression, and feature extraction of the raw data sets, adaptive visualization mappings which allow the users to choose between speed and accuracy, and exploiting new graphics hardware features for fast and high-quality rendering. The recent introduction of advanced programmability in widely available graphics hardware has already led to impressive progress in the area of volume visualization. However, besides the acceleration of the final rendering, flexible graphics hardware is increasingly being used also for the mapping and filtering stages of the visualization pipeline, thus giving rise to new levels of interactivity in visualization applications. The talk will present recent results of applying programmable graphics hardware in various visualization algorithms covering volume data, flow data, terrains, NPR rendering, and distributed and remote applications. [source]


    Spatial Information: Classification and Applications in Building Design

    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, Issue 4 2002
    Tang-Hung Nguyen
    Physical properties of building components are usually represented in building data models by their three-dimensional geometry and topology,also called spatial information. While geometric data of building components can be manipulated and managed by a computer-aided design (CAD) interface, their spatial relations (or topological information) are conventionally represented in a manual fashion into data models. The manual data representation, however, is inherently a complex and challenging task due to the wide variety of spatial relationships. Thus, topological information should be classified and modeled in such a way that the required spatial data for a particular design task can be automatically retrieved. This paper attempts to identify and classify various topological information commonly used in building design and construction into more specific categories (e.g., adjacency, connection, containment, separation, and intersection) to support automatic deduction of the spatial information in a computer-based building design system. The paper also discusses typical applications of the topological relations to different design activities. Finally, the development of deduction algorithms and the proposed building design system are briefly described. [source]


    ModEco: an integrated software package for ecological niche modeling

    ECOGRAPHY, Issue 4 2010
    Qinghua Guo
    ModEco is a software package for ecological niche modeling. It integrates a range of niche modeling methods within a geographical information system. ModEco provides a user friendly platform that enables users to explore, analyze, and model species distribution data with relative ease. ModEco has several unique features: 1) it deals with different types of ecological observation data, such as presence and absence data, presence-only data, and abundance data; 2) it provides a range of models when dealing with presence-only data, such as presence-only models, pseudo-absence models, background vs presence data models, and ensemble models; and 3) it includes relatively comprehensive tools for data visualization, feature selection, and accuracy assessment. [source]


    Bounds on Parameters in Panel Dynamic Discrete Choice Models

    ECONOMETRICA, Issue 3 2006
    Bo E. Honoré
    Identification of dynamic nonlinear panel data models is an important and delicate problem in econometrics. In this paper we provide insights that shed light on the identification of parameters of some commonly used models. Using these insights, we are able to show through simple calculations that point identification often fails in these models. On the other hand, these calculations also suggest that the model restricts the parameter to lie in a region that is very small in many cases, and the failure of point identification may, therefore, be of little practical importance in those cases. Although the emphasis is on identification, our techniques are constructive in that they can easily form the basis for consistent estimates of the identified sets. [source]


    How Did the Elimination of the US Earnings Test above the Normal Retirement Age Affect Labour Supply Expectations?,

    FISCAL STUDIES, Issue 2 2008
    Pierre-Carl Michaud
    H55; J22 Abstract We look at the effect of the 2000 repeal of the earnings test above the normal retirement age (NRA) on the self-reported probabilities of working full-time after ages 65 and 62 of male workers in the US Health and Retirement Study (HRS). Using administrative records on social security benefit entitlements linked to the HRS survey data, we can distinguish groups of respondents according to the predicted effect of the earnings test before its repeal on their marginal wage rate after the NRA. We use panel data models with fixed and random effects to investigate the effect of the repeal. We find that male workers whose predicted marginal wage rate increased because the earnings test was repealed had the largest increase in the subjective probability of working full-time after age 65. We find no significant effects of the repeal on the subjective probability of working full-time past age 62. [source]


    Physical inactivity and its impact on healthcare utilization

    HEALTH ECONOMICS, Issue 8 2009
    Nazmi Sari
    Abstract Physically inactive people are expected to use more healthcare services than active people. This inactivity imposes costs on the collectively funded health insurance programs. In this paper, excess utilization of healthcare services due to physical inactivity is examined using count data models and the Canadian Community Health Survey. The aim of the paper is to estimate utilization of healthcare services associated with inactivity and to estimate its impact on the Canadian healthcare system. The results suggest that physical inactivity increases hospital stays, and use of physician and nurse services. On average, an inactive person spends 38% more days in hospital than an active person. S/he also uses 5.5% more family physician visits, 13% more specialist services, and 12% more nurse visits than an active individual. The subsequent social cost of inactivity for the healthcare system is substantial. Copyright © 2008 John Wiley & Sons, Ltd. [source]


    Gender differences in smoking behavior

    HEALTH ECONOMICS, Issue 9 2007
    Thomas Bauer
    Abstract This paper investigates gender differences in smoking behavior using data from the German Socio-economic Panel (SOEP). We develop a Blinder,Oaxaca decomposition method for count data models, which allows to isolate the part of the gender differential in the number of cigarettes daily smoked that can be explained by differences in observable characteristics from the part attributable to differences in coefficients. Our results reveal that the major part of the gender smoking differential is attributable to differences in coefficients indicating substantial differences in the smoking behavior between men and women. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Intelligence benevolent tools: A global system automating integration of structured and semistructured sources in one process

    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 6 2004
    Mbale Jameson
    In this article, we investigate a global mechanism that merges and automates interoperability of heterogeneity structured and semistructured sources in one process. In particular, we introduce the intelligence benevolent tool (IBT) system comprised of tools like assertions, integration rules, conceptual model constructs, and agents that boost the architectural components' versatility to reconcile the semantics involved in data sharing. Going by the title, the term benevolent in this case refers to tools' ability to do what they are told to do. In this way, the tools shall rejuvenate the system's intelligence to withstand the test of time against the existing terrifically dynamic computer technology in the present and future information age. The first three IBTs are passive objects, whereas the agent has a strong versatility to perceive events, perform actions, communicate, make commitments, and satisfy claims. The IBT's vast intelligence allows the system to filter out and process only the relevant operational sources such as preferences (i.e., customer's interest) from the sites. In addition, the IBT's richness in knowledge and flexibility to accommodate various data models manages to smoothly link system-to-system or firm-to-firm regardless of any field such as engineering, insurance, medicine, space science, and education, to mention a few. © 2004 Wiley Periodicals, Inc. [source]


    Stationary-Increment Variance-Gamma and t Models: Simulation and Parameter Estimation

    INTERNATIONAL STATISTICAL REVIEW, Issue 2 2008
    Richard Finlay
    Summary We detail a method of simulating data from long range dependent processes with variance-gamma or t distributed increments, test various estimation procedures [method of moments (MOM), product-density maximum likelihood (PMLE), non-standard minimum,2and empirical characteristic function estimation] on the data, and assess the performance of each. The investigation is motivated by the apparent poor performance of the MOM technique using real data (Tjetjep & Seneta, 2006); and the need to assess the performance of PMLE for our dependent data models. In the simulations considered the product-density method performs favourably. Résumé Nous détaillons une méthode de simulation de données relatives à des processus à accroissements de lois Variance-Gamma ou t. Nous testons sur ces données diverses procédures d'estimation (méthode des moments, maximum de vraisemblance, ,2 non standard minimum, et fonction caractéristique empirique) et nous évaluons la performance de chacune. Cette étude est motivée par le peu d'efficacité de la technique des moments appliquée à des données réelles (Tjetjep et Seneta 2006) et par le besoin d'évaluer la performance de la méthode du maximum de vraisemblance relative à une densité produit appliquée à nos modèles de données dépendantes. Dans les simulations que nous avons faites la méthode de la densité produit donne des résultats satisfaisants. [source]


    Dynamic treatment effect analysis of TV effects on child cognitive development

    JOURNAL OF APPLIED ECONOMETRICS, Issue 3 2010
    Fali Huang
    We investigate whether TV watching at ages 6,7 and 8,9 affects cognitive development measured by math and reading scores at ages 8,9, using a rich childhood longitudinal sample from NLSY79. Dynamic panel data models are estimated to handle the unobserved child-specific factor, endogeneity of TV watching, and dynamic nature of the causal relation. A special emphasis is placed on the last aspect, where TV watching affects cognitive development, which in turn affects future TV watching. When this feedback occurs, it is not straightforward to identify and estimate the TV effect. We develop a two-stage estimation method which can deal with the feedback feature; we also apply the ,standard' econometric panel data approaches. Overall, for math score at ages 8,9, we find that watching TV during ages 6,7 and 8,9 has a negative total effect, mostly due to a large negative effect of TV watching at the younger ages 6,7. For reading score, there is evidence that watching no more than 2 hours of TV per day has a positive effect, whereas the effect is negative outside this range. In both cases, however, the effect magnitudes are economically small. Copyright © 2010 John Wiley & Sons, Ltd. [source]


    Semiparametric Bayesian inference for dynamic Tobit panel data models with unobserved heterogeneity

    JOURNAL OF APPLIED ECONOMETRICS, Issue 6 2008
    Tong Li
    This paper develops semiparametric Bayesian methods for inference of dynamic Tobit panel data models. Our approach requires that the conditional mean dependence of the unobserved heterogeneity on the initial conditions and the strictly exogenous variables be specified. Important quantities of economic interest such as the average partial effect and average transition probabilities can be readily obtained as a by-product of the Markov chain Monte Carlo run. We apply our method to study female labor supply using a panel data set from the National Longitudinal Survey of Youth 1979. Copyright © 2008 John Wiley & Sons, Ltd. [source]


    Heterogeneity and cross section dependence in panel data models: theory and applications introduction

    JOURNAL OF APPLIED ECONOMETRICS, Issue 2 2007
    Badi H. Baltagi
    The papers included in this special issue are primarily concerned with the problem of cross section dependence and heterogeneity in the analysis of panel data models and their relevance in applied econometric research. Cross section dependence can arise due to spatial or spill over effects, or could be due to unobserved (or unobservable) common factors. Much of the recent research on non-stationary panel data have focussed on this problem. It was clear that the first generation panel unit root and cointegration tests developed in the 1990's, which assumed cross-sectional independence, are inadequate and could lead to significant size distortions in the presence of neglected cross-section dependence. Second generation panel unit root and cointegration tests that take account of possible cross-section dependence in the data have been developed, see the recent surveys by Choi (2006) and Breitung and Pesaran (2007). The papers by Baltagi, Bresson and Pirotte, Choi and Chue, Kapetanios, and Pesaran in this special issue are further contributions to this literature. The papers by Fachin, and Moon and Perron are empirical studies in this area. Controlling for heterogeneity has also been an important concern for empirical researchers with panel data methods promising better handle on heterogeneity than cross-section data methods. The papers by Hsiao, Shen, Wang and Weeks, Pedroni and Serlenga and Shin are empirical contributions to this area. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Health care reform and the number of doctor visits,an econometric analysis

    JOURNAL OF APPLIED ECONOMETRICS, Issue 4 2004
    Rainer Winkelmann
    This paper evaluates the German health care reform of 1997, using the individual number of doctor visits as outcome measure and data from the German Socio-Economic Panel for the years 1995,1999. A number of modified count data models allow us to estimate the effect of the reform in different parts of the distribution. The overall effect of the reform was a 10% reduction in the number of doctor visits. The effect was much larger in the lower part of the distribution than in the upper part. Copyright © 2004 John Wiley & Sons, Ltd. [source]


    A score test for non-nested hypotheses with applications to discrete data models

    JOURNAL OF APPLIED ECONOMETRICS, Issue 5 2001
    J. M. C. Santos Silva
    In this paper it is shown that a convenient score test against non-nested alternatives can be constructed from the linear combination of the likelihood functions of the competing models. This is essentially a test for the correct specification of the conditional distribution of the variable of interest. Given its characteristics, the proposed test is particularly attractive to check the distributional assumptions in models for discrete data. The usefulness of the test is illustrated with an application to models for recreational boating trips. Copyright © 2001 John Wiley & Sons, Ltd. [source]


    Firm, market, and regulatory factors influencing innovation and commercialization in Canada's functional food and nutraceutical sector

    AGRIBUSINESS : AN INTERNATIONAL JOURNAL, Issue 2 2008
    Deepananda Herath
    Factors influencing the development and commercialization of functional food and nutraceutical (FFN) products are explored. Count data models are developed to relate firm, market, and regulatory covariates to the number of FFN product lines firms have under development, on the market, and in total. Canadian firm-level innovation data were taken from Statistics Canada (2003) Functional Food and Nutraceutical Survey. Firms involved in product development/scale-up had more product lines in total and on the market. Firms with a strong and positive perception of the impact of regulatory reform related to generic health claims and harmonization of Canadian regulations with U.S. regulations had fewer product lines in total and on the market. Firms with more positive perceptions of the business impact of structure and function health claims had more product lines on the market. One implication of the study is the importance of developing policies and reforming regulations which better enable use of generic health claims on FFN products. Further, policies which better enable or foster development/scale-up of product lines would increase the Canadian FFN sector's ability to develop new products. [EconLit: O130, L500, Q180]. © 2008 Wiley Periodicals, Inc. [source]


    Forecasting with panel data,

    JOURNAL OF FORECASTING, Issue 2 2008
    Badi H. Baltagi
    Abstract This paper gives a brief survey of forecasting with panel data. It begins with a simple error component regression model and surveys the best linear unbiased prediction under various assumptions of the disturbance term. This includes various ARMA models as well as spatial autoregressive models. The paper also surveys how these forecasts have been used in panel data applications, running horse races between heterogeneous and homogeneous panel data models using out-of-sample forecasts. Copyright © 2008 John Wiley & Sons, Ltd. [source]


    On-line expectation,maximization algorithm for latent data models

    JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 3 2009
    Olivier Cappé
    Summary., We propose a generic on-line (also sometimes called adaptive or recursive) version of the expectation,maximization (EM) algorithm applicable to latent variable models of independent observations. Compared with the algorithm of Titterington, this approach is more directly connected to the usual EM algorithm and does not rely on integration with respect to the complete-data distribution. The resulting algorithm is usually simpler and is shown to achieve convergence to the stationary points of the Kullback,Leibler divergence between the marginal distribution of the observation and the model distribution at the optimal rate, i.e. that of the maximum likelihood estimator. In addition, the approach proposed is also suitable for conditional (or regression) models, as illustrated in the case of the mixture of linear regressions model. [source]


    SPLASH: Systematic proteomics laboratory analysis and storage hub

    PROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 6 2006
    Siaw Ling Lo
    Abstract In the field of proteomics, the increasing difficulty to unify the data format, due to the different platforms/instrumentation and laboratory documentation systems, greatly hinders experimental data verification, exchange, and comparison. Therefore, it is essential to establish standard formats for every necessary aspect of proteomics data. One of the recently published data models is the proteomics experiment data repository [Taylor, C. F., Paton, N. W., Garwood, K. L., Kirby, P. D. et,al., Nat. Biotechnol. 2003, 21, 247,254]. Compliant with this format, we developed the systematic proteomics laboratory analysis and storage hub (SPLASH) database system as an informatics infrastructure to support proteomics studies. It consists of three modules and provides proteomics researchers a common platform to store, manage, search, analyze, and exchange their data. (i),Data maintenance includes experimental data entry and update, uploading of experimental results in batch mode, and data exchange in the original PEDRo format. (ii),The data search module provides several means to search the database, to view either the protein information or the differential expression display by clicking on a gel image. (iii),The data mining module contains tools that perform biochemical pathway, statistics-associated gene ontology, and other comparative analyses for all the sample sets to interpret its biological meaning. These features make SPLASH a practical and powerful tool for the proteomics community. [source]


    Omitted variables in longitudinal data models

    THE CANADIAN JOURNAL OF STATISTICS, Issue 4 2001
    Edward W. Frees
    Abstract The omission of important variables is a well-known model specification issue in regression analysis and mixed linear models. The author considers longitudinal data models that are special cases of the mixed linear models; in particular, they are linear models of repeated observations on a subject. Models of omitted variables have origins in both the econometrics and biostatistics literatures. The author describes regression coefficient estimators that are robust to and that provide the basis for detecting the influence of certain types of omitted variables. New robust estimators and omitted variable tests are introduced and illustrated with a case study that investigates the determinants of tax liability. [source]


    Consistent estimation of binary-choice panel data models with heterogeneous linear trends

    THE ECONOMETRICS JOURNAL, Issue 2 2006
    Alban Thomas
    Summary, This paper presents an extension of fixed effects binary choice models for panel data, to the case of heterogeneous linear trends. Two estimators are proposed: a Logit estimator based on double conditioning and a semiparametric, smoothed maximum score estimator based on double differences. We investigate small-sample properties of these estimators with a Monte Carlo simulation experiment, and compare their statistical properties with standard fixed effects procedures. An empirical application to land renting decisions of Russian households between 1996 and 2002 is proposed. [source]


    Testing for stationarity in heterogeneous panel data

    THE ECONOMETRICS JOURNAL, Issue 2 2000
    Kaddour Hadri
    This paper proposes a residual-based Lagrange multiplier (LM) test for a null that the individual observed series are stationary around a deterministic level or around a deterministic trend against the alternative of a unit root in panel data. The tests which are asymptotically similar under the null, belong to the locally best invariant (LBI) test statistics. The asymptotic distributions of the statistics are derived under the null and are shown to be normally distributed. Finite sample sizes and powers are considered in a Monte Carlo experiment. The empirical sizes of the tests are close to the true size even in small samples. The testing procedure is easy to apply, including, to panel data models with fixed effects, individual deterministic trends and heterogeneous errors across cross-sections. It is also shown how to apply the tests to the more general case of serially correlated disturbance terms. [source]


    Dynamic or Static Capabilities?

    THE JOURNAL OF PRODUCT INNOVATION MANAGEMENT, Issue 5 2009
    Process Management Practices, Response to Technological Change
    Whether and how organizations adapt to changes in their environments has been a prominent theme in organization and strategy research. Within this research, there is controversy about whether organizational routines hamper or facilitate adaptation. Organizational routines give rise to inertia but are also the vehicles for change in recent work on dynamic capabilities. This rising interest in routines in research coincides with an increase in management practices focused on organizational routines and processes. This study explores how the increasing use of process management practices affected organizational response to a major technological change through new product developments. The empirical setting is the photography industry over a decade, during the shift from silver-halide chemistry to digital technology. The advent and rise of practices associated with the new ISO 9000 certification program in the 1990s coincided with increasing technological substitution in photography, allowing for assessing how increasing attention to routines through ISO 9000 practices over time affected ongoing responsiveness to the technological change. The study further compares the effects for the incumbent firms in the existing technology with nonincumbent firms entering from elsewhere. Relying on longitudinal panel data models as well as hazard models, findings show that greater process management practices dampened response to new generations of digital technology, but this effect differed for incumbents and nonincumbents. Increasing use of process management practices over time had a greater negative effect on incumbents' response to the rapid technological change. The study contributes to research in technological change by highlighting specific management practices that may create disconnects between firms' capabilities and changing environments and disadvantage incumbents in the face of radical technological change. This research also contributes to literature on organizational routines and capabilities. Studying the effects of increasing ISO 9000 practices undertaken in firms provides an opportunity to gauge the effects of systematic routinization of organizational activities and their effects on adaptation. This research also contributes to management practice. The promise of process management is to help firms adapt to changing environments, and, as such, managers facing technological change may adopt process management practices as a response to uncertainty and change. But managers must more fully understand the potential benefits and risks of process management to ensure these practices are used in the appropriate contexts. [source]


    Structured additive regression for overdispersed and zero-inflated count data

    APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, Issue 4 2006
    Ludwig Fahrmeir
    Abstract In count data regression there can be several problems that prevent the use of the standard Poisson log-linear model: overdispersion, caused by unobserved heterogeneity or correlation, excess of zeros, non-linear effects of continuous covariates or of time scales, and spatial effects. We develop Bayesian count data models that can deal with these issues simultaneously and within a unified inferential approach. Models for overdispersed or zero-inflated data are combined with semiparametrically structured additive predictors, resulting in a rich class of count data regression models. Inference is fully Bayesian and is carried out by computationally efficient MCMC techniques. Simulation studies investigate performance, in particular how well different model components can be identified. Applications to patent data and to data from a car insurance illustrate the potential and, to some extent, limitations of our approach. Copyright © 2006 John Wiley & Sons, Ltd. [source]


    Modelling the impact of oxygenated VOC and meteorology upon the boundary layer photochemistry at the South Pole

    ATMOSPHERIC SCIENCE LETTERS, Issue 1 2007
    P. D. Hamer
    Abstract A chemistry box model is used to explore implications of recent measurements of methyl hydroperoxide (MHP) across Antarctica and their influence upon high ozone events in the South Pole boundary layer. To reconcile, recent data models suggest that chemistry and meteorology play an important role. Copyright © 2007 Royal Meteorological Society [source]


    Effective directed tests for models with ordered categorical data

    AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 3 2003
    Arthur Cohen
    Summary This paper offers a new method for testing one-sided hypotheses in discrete multivariate data models. One-sided alternatives mean that there are restrictions on the multidimensional parameter space. The focus is on models dealing with ordered categorical data. In particular, applications are concerned with R×C contingency tables. The method has advantages over other general approaches. All tests are exact in the sense that no large sample theory or large sample distribution theory is required. Testing is unconditional although its execution is done conditionally, section by section, where a section is determined by marginal totals. This eliminates any potential nuisance parameter issues. The power of the tests is more robust than the power of the typical linear tests often recommended. Furthermore, computer programs are available to carry out the tests efficiently regardless of the sample sizes or the order of the contingency tables. Both censored data and uncensored data models are discussed. [source]


    Implementing a management system architecture framework

    BELL LABS TECHNICAL JOURNAL, Issue 4 2000
    William C. Goers
    Any practical vision for the evolution of communications services must include a strategy for how networking vendors make it possible for service providers to manage their networks. While the Telecommunications Management Network (TMN) framework has proponents, the IP services community has shown little interest. Furthermore, operations systems developers have long attempted to produce the best framework, but the technology is outdated before it exists. This paper addresses both issues by presenting an application-driven model for integrated management. This model can be applied to either a "classic" framework orientation or a management application view. What is common between these two views are a management portal, common data models, multiple interface technologies, open and simple network element interfaces, and common operations, administration, and administration (OA&M) tools. These are the elements for which there needs to be a consistent set of interface definitions. They form the basis for the construction of next-generation management applications. [source]


    A Bayesian Chi-Squared Goodness-of-Fit Test for Censored Data Models

    BIOMETRICS, Issue 2 2010
    Jing Cao
    Summary We propose a Bayesian chi-squared model diagnostic for analysis of data subject to censoring. The test statistic has the form of Pearson's chi-squared test statistic and is easy to calculate from standard output of Markov chain Monte Carlo algorithms. The key innovation of this diagnostic is that it is based only on observed failure times. Because it does not rely on the imputation of failure times for observations that have been censored, we show that under heavy censoring it can have higher power for detecting model departures than a comparable test based on the complete data. In a simulation study, we show that tests based on this diagnostic exhibit comparable power and better nominal Type I error rates than a commonly used alternative test proposed by Akritas (1988,,Journal of the American Statistical Association,83, 222,230). An important advantage of the proposed diagnostic is that it can be applied to a broad class of censored data models, including generalized linear models and other models with nonidentically distributed and nonadditive error structures. We illustrate the proposed model diagnostic for testing the adequacy of two parametric survival models for Space Shuttle main engine failures. [source]


    Generalized Hierarchical Multivariate CAR Models for Areal Data

    BIOMETRICS, Issue 4 2005
    Xiaoping Jin
    Summary In the fields of medicine and public health, a common application of areal data models is the study of geographical patterns of disease. When we have several measurements recorded at each spatial location (for example, information on p, 2 diseases from the same population groups or regions), we need to consider multivariate areal data models in order to handle the dependence among the multivariate components as well as the spatial dependence between sites. In this article, we propose a flexible new class of generalized multivariate conditionally autoregressive (GMCAR) models for areal data, and show how it enriches the MCAR class. Our approach differs from earlier ones in that it directly specifies the joint distribution for a multivariate Markov random field (MRF) through the specification of simpler conditional and marginal models. This in turn leads to a significant reduction in the computational burden in hierarchical spatial random effect modeling, where posterior summaries are computed using Markov chain Monte Carlo (MCMC). We compare our approach with existing MCAR models in the literature via simulation, using average mean square error (AMSE) and a convenient hierarchical model selection criterion, the deviance information criterion (DIC; Spiegelhalter et al., 2002, Journal of the Royal Statistical Society, Series B64, 583,639). Finally, we offer a real-data application of our proposed GMCAR approach that models lung and esophagus cancer death rates during 1991,1998 in Minnesota counties. [source]