Model Selection (model + selection)

Distribution by Scientific Domains

Kinds of Model Selection

  • bayesian model selection
  • information-theoretic model selection

  • Terms modified by Model Selection

  • model selection approach
  • model selection criterioN
  • model selection procedure

  • Selected Abstracts


    It is beneficial to observe that popular model selection criteria for the linear model are equivalent to penalized versions of R2. Let PR2 refer to any one of these model selection criteria. Then PR2 serves the dual purpose of selecting the model and summarizing the resulting fit subject to the penalty function. Furthermore, it is straightforward to extend the logic of PR2 to instrumental variables estimation and the non-parametric selection of regressors. For two-stage least squares estimation, a simulation study investigates the finite-sample performance of PR2 to select the correct model in cases of either strong or weak instruments. [source]


    Paul Kabaila
    Summary We consider a linear regression model, with the parameter of interest a specified linear combination of the components of the regression parameter vector. We suppose that, as a first step, a data-based model selection (e.g. by preliminary hypothesis tests or minimizing the Akaike information criterion , AIC) is used to select a model. It is common statistical practice to then construct a confidence interval for the parameter of interest, based on the assumption that the selected model had been given to us,a priori. This assumption is false, and it can lead to a confidence interval with poor coverage properties. We provide an easily computed finite-sample upper bound (calculated by repeated numerical evaluation of a double integral) to the minimum coverage probability of this confidence interval. This bound applies for model selection by any of the following methods: minimum AIC, minimum Bayesian information criterion (BIC), maximum adjusted,R2, minimum Mallows' CP and,t -tests. The importance of this upper bound is that it delineates general categories of design matrices and model selection procedures for which this confidence interval has poor coverage properties. This upper bound is shown to be a finite-sample analogue of an earlier large-sample upper bound due to Kabaila and Leeb. [source]


    Samuel Müller
    Summary We propose a new approach to the selection of partially linear models based on the conditional expected prediction square loss function, which is estimated using the bootstrap. Because of the different speeds of convergence of the linear and the nonlinear parts, a key idea is to select each part separately. In the first step, we select the nonlinear components using an ,m -out-of- n' residual bootstrap that ensures good properties for the nonparametric bootstrap estimator. The second step selects the linear components from the remaining explanatory variables, and the non-zero parameters are selected based on a two-level residual bootstrap. We show that the model selection procedure is consistent under some conditions, and our simulations suggest that it selects the true model most often than the other selection procedures considered. [source]


    Gerda Claeskens
    Summary In order to make predictions of future values of a time series, one needs to specify a forecasting model. A popular choice is an autoregressive time-series model, for which the order of the model is chosen by an information criterion. We propose an extension of the focused information criterion (FIC) for model-order selection, with emphasis on a high predictive accuracy (i.e. the mean squared forecast error is low). We obtain theoretical results and illustrate by means of a simulation study and some real data examples that the FIC is a valid alternative to the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) for selection of a prediction model. We also illustrate the possibility of using the FIC for purposes other than forecasting, and explore its use in an extended model. [source]

    Schwarz, Wallace, and Rissanen: Intertwining Themes in Theories of Model Selection

    Aaron D. Lanterman
    Summary Investigators interested in model order estimation have tended to divide themselves into widely separated camps; this survey of the contributions of Schwarz, Wallace, Rissanen, and their coworkers attempts to build bridges between the various viewpoints, illuminating connections which may have previously gone unnoticed and clarifying misconceptions which seem to have propagated in the applied literature. Our tour begins with Schwarz's approximation of Bayesian integrals via Laplace's method. We then introduce the concepts underlying Rissanen's minimum description length principle via a Bayesian scenario with a known prior; this provides the groundwork for understanding his more complex non-Bayesian MDL which employs a "universal" encoding of the integers. Rissanen's method of parameter truncation is contrasted with that employed in various versions of Wallace's minimum message length criteria. Rissanen's more recent notion of stochastic complexity is outlined in terms of Bernardo's information-theoretic derivation of the Jeffreys prior. Résumé Il existe deux courants d'idées tres différents en recberche sur I' ordre de modéles.Ce papier est une revue des contributions de Schwarz, Wallace, Rissanen, et de leurs collaborateurs, Son but est de rapprocher leurs points de vue, d' établir de nouvelles connexions entre certains problémes, et de corriger certaines interprétations erronées qui sont apparues dans la litérature appliquée. Notre revue commence par I' approximation d' intégrales Bayesiennes au moyen de la méthode de Lapace,étudiée par Schwarz. Nous introduisons ensuite le principe de longueur descriptive minimale de Rissanen dans le cadre d' un scénario d' estimation Bayesienne. Ceci permet une nouvelle interpretation de ses méthodes d' estimation basées sur un codage "univasel" des entiers nabuels. Nous comparons la technique de paramétres de Rissanen avec cellcs qu'utilisc Wallace daar sa mtOaic du crib de longueur minimale d'un mtssage. Nous tcrminons cette étude par une présentation de la notion de complexité stochastique de Rissanen et ses connexions avec la distribution de Jeffreys, dont Bernardo a presenté une dérivation basée sur la théorie de l'infaamation. [source]

    Bayesian Subset Model Selection for Time Series

    N. K. Unnikrishnan
    Abstract., This paper considers the problem of subset model selection for time series. In general, a few lags which are not necessarily continuous, explain lag structure of a time-series model. Using the reversible jump Markov chain technique, the paper develops a fully Bayesian solution for the problem. The method is illustrated using the self-exciting threshold autoregressive (SETAR), bilinear and AR models. The Canadian lynx data, the Wolfe's sunspot numbers and Series A of Box and Jenkins (1976) are analysed in detail. [source]

    Model Selection for Broadband Semiparametric Estimation of Long Memory in Time Series

    Clifford M. Hurvich
    We study the properties of Mallows' CL criterion for selecting a fractional exponential (FEXP) model for a Gaussian long-memory time series. The aim is to minimize the mean squared error of a corresponding regression estimator dFEXP of the memory parameter, d. Under conditions which do not require that the data were actually generated by a FEXP model, it is known that the mean squared error MSE=E[dFEXP,d]2 can converge to zero as fast as (log n)/n, where n is the sample size, assuming that the number of parameters grows slowly with n in a deterministic fashion. Here, we suppose that the number of parameters in the FEXP model is chosen so as to minimize a local version of CL, restricted to frequencies in a neighborhood of zero. We show that, under appropriate conditions, the expected value of the local CL is asymptotically equivalent to MSE. A combination of theoretical and simulation results give guidance as to the choice of the degree of locality in CL. [source]

    Cross-validation Criteria for Setar Model Selection

    Jan G. De Gooijer
    Three cross-validation criteria, denoted by respectively C, Cc, and Cu, are proposed for selecting the orders of a self-exciting threshold autoregressive (SETAR) model when both the delay and the threshold value are unknown. The derivation of C is within a natural cross-validation framework. The criterion Cc is similar in spirit as AICc, a bias-corrected version of AIC for SETAR model selection introduced by Wong and Li (1998). The criterion Cu is a variant of Cc having a similar poperty as AICu, a model selection proposed by McQuarrie et al. (1997) for linear models. In a Monte Carlo study, the performance of each of the criteria C, Cc, Cu, AIC, AICc, AICu, and BIC is investigated in detail for various models and various sample sizes. It will be shown that Cu consistently outperforms all other criteria when the sample size is moderate to large. [source]

    Model Selection for Monetary Policy Analysis: How Important is Empirical Validity?,

    Q. Farooq Akram
    Abstract We investigate the economic significance of trading off empirical validity of models against other desirable model properties. Our investigation is based on three alternative econometric systems of the supply side, in a model that can be used to discuss optimal monetary policy in Norway. Our results caution against compromising empirical validity when selecting a model for policy analysis. We also find large costs from basing policies on the robust model, or on a suite of models, even when it contains the valid model. This confirms an important role for econometric modelling and evaluation in model choice for policy analysis. [source]

    Linear Mixed Model Selection for False Discovery Rate Control in Microarray Data Analysis

    BIOMETRICS, Issue 2 2010
    Cumhur Yusuf Demirkale
    Summary In a microarray experiment, one experimental design is used to obtain expression measures for all genes. One popular analysis method involves fitting the same linear mixed model for each gene, obtaining gene-specific,p -values for tests of interest involving fixed effects, and then choosing a threshold for significance that is intended to control false discovery rate (FDR) at a desired level. When one or more random factors have zero variance components for some genes, the standard practice of fitting the same full linear mixed model for all genes can result in failure to control FDR. We propose a new method that combines results from the fit of full and selected linear mixed models to identify differentially expressed genes and provide FDR control at target levels when the true underlying random effects structure varies across genes. [source]

    Random Effect and Latent Variable Model Selection edited by DUNSON, D. B.

    BIOMETRICS, Issue 3 2009
    Kenneth Rice
    No abstract is available for this article. [source]

    Model Selection and Model Averaging by CLAESKENS, G. and HJORT, N. L.

    BIOMETRICS, Issue 2 2009
    Thomas M. Loughin
    No abstract is available for this article. [source]

    Differential Equation Modeling of HIV Viral Fitness Experiments: Model Identification, Model Selection, and Multimodel Inference

    BIOMETRICS, Issue 1 2009
    Hongyu Miao
    Summary Many biological processes and systems can be described by a set of differential equation (DE) models. However, literature in statistical inference for DE models is very sparse. We propose statistical estimation, model selection, and multimodel averaging methods for HIV viral fitness experiments in vitro that can be described by a set of nonlinear ordinary differential equations (ODE). The parameter identifiability of the ODE models is also addressed. We apply the proposed methods and techniques to experimental data of viral fitness for HIV-1 mutant 103N. We expect that the proposed modeling and inference approaches for the DE models can be widely used for a variety of biomedical studies. [source]

    Exploratory Bayesian Model Selection for Serial Genetics Data

    BIOMETRICS, Issue 2 2005
    Jing X. Zhao
    Summary Characterizing the process by which molecular and cellular level changes occur over time will have broad implications for clinical decision making and help further our knowledge of disease etiology across many complex diseases. However, this presents an analytic challenge due to the large number of potentially relevant biomarkers and the complex, uncharacterized relationships among them. We propose an exploratory Bayesian model selection procedure that searches for model simplicity through independence testing of multiple discrete biomarkers measured over time. Bayes factor calculations are used to identify and compare models that are best supported by the data. For large model spaces, i.e., a large number of multi-leveled biomarkers, we propose a Markov chain Monte Carlo (MCMC) stochastic search algorithm for finding promising models. We apply our procedure to explore the extent to which HIV-1 genetic changes occur independently over time. [source]

    Model Selection for Integrated Recovery/Recapture Data

    BIOMETRICS, Issue 4 2002
    R. King
    Summary. Catchpole et al. (1998, Biometrics 54, 33,46) provide a novel scheme for integrating both recovery and recapture data analyses and derive sufficient statistics that facilitate likelihood computations. In this article, we demonstrate how their efficient likelihood expression can facilitate Bayesian analyses of these kinds of data and extend their methodology to provide a formal framework for model determination. We consider in detail the issue of model selection with respect to a set of recapture/recovery histories of shags (Phalacrocorax aristotelis) and determine, from the enormous range of biologically plausible models available, which best describe the data. By using reversible jump Markov chain Monte Carlo methodology, we demonstrate how this enormous model space can be efficiently and effectively explored without having to resort to performing an infeasibly large number of pairwise comparisons or some ad hoc stepwise procedure. We find that the model used by Catchpole et al. (1998) has essentially zero posterior probability and that, of the 477,144 possible models considered, over 60% of the posterior mass is placed on three neighboring models with biologically interesting interpretations. [source]

    Model Selection in Estimating Equations

    BIOMETRICS, Issue 2 2001
    Wei Pan
    Summary. Model selection is a necessary step in many practical regression analyses. But for methods based on estimating equations, such as the quasi-likelihood and generalized estimating equation (GEE) approaches, there seem to be few well-studied model selection techniques. In this article, we propose a new model selection criterion that minimizes the expected predictive bias (EPB) of estimating equations. A bootstrap smoothed cross-validation (BCV) estimate of EPB is presented and its performance is assessed via simulation for overdispersed generalized linear models. For illustration, the method is applied to a real data set taken from a study of the development of ewe embryos. [source]

    Spatio-temporal point process filtering methods with an application

    ENVIRONMETRICS, Issue 3-4 2010
    ena Frcalová
    Abstract The paper deals with point processes in space and time and the problem of filtering. Real data monitoring the spiking activity of a place cell of hippocampus of a rat moving in an environment are evaluated. Two approaches to the modelling and methodology are discussed. The first one (known from literature) is based on recursive equations which enable to describe an adaptive system. Sequential Monte Carlo methods including particle filter algorithm are available for the solution. The second approach makes use of a continuous time shot-noise Cox point process model. The inference of the driving intensity leads to a nonlinear filtering problem. Parametric models support the solution by means of the Bayesian Markov chain Monte Carlo methods, moreover the Cox model enables to detect adaptivness. Model selection is discussed, numerical results are presented and interpreted. Copyright © 2009 John Wiley & Sons, Ltd. [source]

    Geographical and taxonomic influences on cranial variation in red colobus monkeys (Primates, Colobinae): introducing a new approach to ,morph' monkeys

    GLOBAL ECOLOGY, Issue 2 2009
    Andrea Cardini
    ABSTRACT Aim, To provide accurate but parsimonious quantitative descriptions of clines in cranial form of red colobus, to partition morphological variance into geographical, taxonomic and structured taxonomic components, and to visually summarize clines in multivariate shape data using a method which produces results directly comparable to both univariate studies of geographical variation and standard geometric morphometric visualization of shape differences along vectors. Location, Equatorial Africa. Methods, Sixty-four three-dimensional cranial landmarks were measured on 276 adult red colobus monkeys sampled over their entire distribution. Geometric morphometric methods were applied, and size and shape variables regressed onto geographical coordinates using linear and curvilinear models. Model selection was done using the second-order Akaike information criterion. Components of variation related to geography, taxon or their combined effect were partitioned using partial regresssion. Multivariate trends in clinal shape were summarized using principal components of predictions from regressions, plotting vector scores on maps as for univariate size, and visualizing differences along main axes of clinal shape variation using surface rendering. Results, Significant clinal variation was found in size and shape. Clines were similar in females and males. Trend surface analysis tended to be more accurate and parsimonious than alternative models in predicting morphology based on geography. Cranial form was relatively paedomorphic in East Africa and peramorphic in central Africa. Most taxonomic variation was geographically structured. However, taxonomic differences alone accounted for a larger proportion of total explained variance in shape (up to 40%) than in size (, 20%). Main conclusions, A strong cline explained most of the observed size variation and a significant part of the shape differences of red colobus crania. The pattern of geographical variation was largely similar to that previously reported in vervets, despite different habitat preferences (arboreal versus terrestrial) and a long period since divergence (c. 14,15 Myr). This suggests that some aspects of morphological divergence in both groups may have been influenced by similar environmental, geographical and historical factors. Cranial size is likely to be evolutionarily more labile and thus better reflects the influence of recent environmental changes. Cranial shape could be more resilient to change and thus better reflects phylogenetically informative differences. [source]

    Resident and transient dynamics, site fidelity and survival in wintering Blackcaps Sylvia atricapilla: evidence from capture,recapture analyses

    IBIS, Issue 2 2007
    In their winter quarters, migrant birds may either remain within a small area (resident strategy) or move frequently over a large area looking for locally abundant food (transient strategy). It has been suggested that both strategies could simultaneously occur in the same population. We used time-since-marking capture,recapture models to infer the coexistence of these two behavioural strategies (transient and resident) among wintering Blackcaps Sylvia atricapilla using weekly recapture data over a 7-year period. A related question is whether Blackcaps, if surviving to the next winter, always return to the same wintering area, so we also used this approach to analyse winter site fidelity and to estimate annual survival probabilities. Model selection supported the existence of heterogeneity in survival estimates for both the within-season and the interannual survival probabilities, i.e. there was evidence for the existence of transients. It was estimated that 26% of the Blackcaps were resident during the winter. Mean apparent annual survival probability was 0.46 (se = ±0.11). However, there was some evidence suggesting that not all individuals showed winter site fidelity. The estimated proportion of individuals that, if alive, returned to the wintering area was 28%. This is the first study to show the existence of these two behavioural strategies (residence and transience) among wintering Blackcaps, and the first confirming this pattern using capture,recapture models. These models considering transient and resident dynamics may become an important tool with which to analyse wintering strategies. [source]

    Non-parametric statistical methods for multivariate calibration model selection and comparison,

    Edward V. Thomas
    Abstract Model selection is an important issue when constructing multivariate calibration models using methods based on latent variables (e.g. partial least squares regression and principal component regression). It is important to select an appropriate number of latent variables to build an accurate and precise calibration model. Inclusion of too few latent variables can result in a model that is inaccurate over the complete space of interest. Inclusion of too many latent variables can result in a model that produces noisy predictions through incorporation of low-order latent variables that have little or no predictive value. Commonly used metrics for selecting the number of latent variables are based on the predicted error sum of squares (PRESS) obtained via cross-validation. In this paper a new approach for selecting the number of latent variables is proposed. In this new approach the prediction errors of individual observations (obtained from cross-validation) are compared across models incorporating varying numbers of latent variables. Based on these comparisons, non-parametric statistical methods are used to select the simplest model (least number of latent variables) that provides prediction quality that is indistinguishable from that provided by more complex models. Unlike methods based on PRESS, this new approach is robust to the effects of anomalous observations. More generally, the same approach can be used to compare the performance of any models that are applied to the same data set where reference values are available. The proposed methodology is illustrated with an industrial example involving the prediction of gasoline octane numbers from near-infrared spectra. Published in 2004 by John Wiley & Sons, Ltd. [source]

    An analysis of hatching success in the great reed warbler Acrocephalus arundinaceus

    OIKOS, Issue 3 2008
    Jonas Knape
    Hatching success is a potentially important fitness component for avian species. Previous studies of hatching success in natural populations have primarily focused on effects of inbreeding but a general understanding of variation in hatching success is lacking. We analyse data on hatching success in a population of great reed warblers Acrocephalus arundinaceus in Lake Kvismaren in south central Sweden. The effects of a range of covariates, including three measures of inbreeding as well as effects of classifications in the data (such as identities of individuals), on hatching success are analysed simultaneously. This is done by means of fitting Bayesian binomial mixed models using Markov chain Monte Carlo methods. Using random effects for each individual parent we check for unexplained variation in hatching success among male and female individuals and compare it to effects of covariates such as degree of inbreeding. Model selection showed that there was a significant amount of unexplained variation in hatching probability between females. This was manifested by a few females laying eggs with a substantially lower hatching success than the majority of the females. The deviations were of the same order of magnitude as the significant effect of parent relatedness on hatching success. Whereas the negative effect of parent relatedness on hatchability is an expression of inbreeding, the female individual effect is not due to inbreeding and could reflect maternal effects, that females differ in fertilisation and/or incubation ability, or an over representation of genetic components from the female acting on the early developing embryo. [source]

    Model selection for generalized linear models with factor-augmented predictors

    Tomohiro Ando
    Abstract This paper considers generalized linear models in a data-rich environment in which a large number of potentially useful explanatory variables are available. In particular, it deals with the case that the sample size and the number of explanatory variables are of similar sizes. We adopt the idea that the relevant information of explanatory variables concerning the dependent variable can be represented by a small number of common factors and investigate the issue of selecting the number of common factors while taking into account the effect of estimated regressors. We develop an information criterion under model mis-specification for both the distributional and structural assumptions and show that the proposed criterion is a natural extension of the Akaike information criterion (AIC). Simulations and empirical data analysis demonstrate that the proposed new criterion outperforms the AIC and Bayesian information criterion. Copyright © 2009 John Wiley & Sons, Ltd. [source]

    ,Model selection for generalized linear models with factor-augmented predictors'

    W. K. Li
    No abstract is available for this article. [source]

    ,Model selection for generalized linear models with factor-augmented predictors'

    Hansheng Wang
    No abstract is available for this article. [source]

    ,Model selection for generalized linear models with factor-augmented predictors'

    T. Ando
    First page of article [source]

    Model Selection in Estimating Equations

    BIOMETRICS, Issue 2 2001
    Wei Pan
    Summary. Model selection is a necessary step in many practical regression analyses. But for methods based on estimating equations, such as the quasi-likelihood and generalized estimating equation (GEE) approaches, there seem to be few well-studied model selection techniques. In this article, we propose a new model selection criterion that minimizes the expected predictive bias (EPB) of estimating equations. A bootstrap smoothed cross-validation (BCV) estimate of EPB is presented and its performance is assessed via simulation for overdispersed generalized linear models. For illustration, the method is applied to a real data set taken from a study of the development of ewe embryos. [source]

    Structural Health Monitoring via Measured Ritz Vectors Utilizing Artificial Neural Networks

    Heung-Fai Lam
    Unlike most other pattern recognition methods, an artificial neural network (ANN) technique is employed as a tool for systematically identifying the damage pattern corresponding to an observed feature. An important aspect of using an ANN is its design but this is usually skipped in the literature on ANN-based SHM. The design of an ANN has significant effects on both the training and performance of the ANN. As the multi-layer perceptron ANN model is adopted in this work, ANN design refers to the selection of the number of hidden layers and the number of neurons in each hidden layer. A design method based on a Bayesian probabilistic approach for model selection is proposed. The combination of the pattern recognition method and the Bayesian ANN design method forms a practical SHM methodology. A truss model is employed to demonstrate the proposed methodology. [source]

    The transferability of distribution models across regions: an amphibian case study

    Flavio Zanini
    ABSTRACT Aim, Predicting species distribution is of fundamental importance for ecology and conservation. However, distribution models are usually established for only one region and it is unknown whether they can be transferred to other geographical regions. We studied the distribution of six amphibian species in five regions to address the question of whether the effect of landscape variables varied among regions. We analysed the effect of 10 variables extracted in six concentric buffers (from 100 m to 3 km) describing landscape composition around breeding ponds at different spatial scales. We used data on the occurrence of amphibian species in a total of 655 breeding ponds. We accounted for proximity to neighbouring populations by including a connectivity index to our models. We used logistic regression and information-theoretic model selection to evaluate candidate models for each species. Location, Switzerland. Results, The explained deviance of each species' best models varied between 5% and 32%. Models that included interactions between a region and a landscape variable were always included in the most parsimonious models. For all species, models including region-by-landscape interactions had similar support (Akaike weights) as models that did not include interaction terms. The spatial scale at which landscape variables affected species distribution varied from 100 m to 1000 m, which was in agreement with several recent studies suggesting that land use far away from the ponds can affect pond occupancy. Main conclusions, Different species are affected by different landscape variables at different spatial scales and these effects may vary geographically, resulting in a generally low transferability of distribution models across regions. We also found that connectivity seems generally more important than landscape variables. This suggests that metapopulation processes may play a more important role in species distribution than habitat characteristics. [source]

    Ecological boundary detection using Carlin,Chib Bayesian model selection

    Ralph Mac Nally
    ABSTRACT Sharp ecological transitions in space (ecotones, edges, boundaries) often are where ecologically important events occur, such as elevated or reduced biodiversity or altered ecological functions (e.g. changes in productivity, pollination rates or parasitism loads, nesting success). While human observers often identify these transitions by using intuitive or gestalt assignments (e.g. the boundary between a remnant woodland patch and the surrounding farm paddock seems obvious), it is clearly desirable to make statistical assessments based on measurements. These assessments often are straightforward to make if the data are univariate, but identifying boundaries or transitions using compositional or multivariate data sets is more difficult. There is a need for an intermediate step in which pairwise similarities between points or temporal samples are computed. Here, I describe an approach that treats points along a transect as alternative hypotheses (models) about the location of the boundary. Carlin and Chib (1995) introduced a Bayesian technique for comparing non-hierarchical models, which I adapted to compute the probabilities of each boundary location (i.e. a model) relative to the ensemble of models constituting the set of possible points of the boundary along the transect. Several artificial data sets and two field data sets (on vegetation and soils and on cave-dwelling invertebrates and microclimates) are used to illustrate the approach. The method can be extended to cases in with several boundaries along a gradient, such as where there is an ecotone of non-zero thickness. [source]

    Meteorological factors affecting the diversity of airborne algae in an urban atmosphere

    ECOGRAPHY, Issue 5 2006
    Naveen K. Sharma
    Aeroalgal sampling of Varanasi City, India, was done using a Tilak Rotorod sampler and exposing agarised Bold basal medium Petri plates during March 2003 to February 2005. Amongst the 34 airborne algal genera recorded, cyanobacteria dominated the aero-algal flora, followed by green algae and diatoms. The generic diversity of airborne algae as well as the constituting groups exhibited seasonal variation. The most favored period for the appearance of cyanobacteria in the air was summer, while winter favored green algae. Presence of diatoms was almost uniform throughout the year. The presence of algal particles in the air depended upon the abundance and dynamics of algal source and their release and dispersal in the atmosphere. Best model selection with Akaike information criteria indicated temperature, relative humidity, rainfall, wind velocity as the most important climatic factors determining algal diversity. These factors exert their effect both directly by influencing entrainment and dispersal of algae from the source, and indirectly by regulating the dynamics of the possible algal source (soil, water, plant body, wall and roof of the building) by supporting or inhibiting the algal growth. In a closed environment and at low altitude sampling site characteristic is also an important factor. Open area near to the countryside had maximal aero-algal diversity. [source]