Data Set Consisting (data + set_consisting)

Distribution by Scientific Domains


Selected Abstracts


A new biodegradation prediction model specific to petroleum hydrocarbons

ENVIRONMENTAL TOXICOLOGY & CHEMISTRY, Issue 8 2005
Philip Howard
Abstract A new predictive model for determining quantitative primary biodegradation half-lives of individual petroleum hydrocarbons has been developed. This model uses a fragment-based approach similar to that of several other biodegradation models, such as those within the Biodegradation Probability Program (BIOWIN) estimation program. In the present study, a half-life in days is estimated using multiple linear regression against counts of 31 distinct molecular fragments. The model was developed using a data set consisting of 175 compounds with environmentally relevant experimental data that was divided into training and validation sets. The original fragments from the Ministry of International Trade and Industry BIOWIN model were used initially as structural descriptors and additional fragments were then added to better describe the ring systems found in petroleum hydrocarbons and to adjust for nonlinearity within the experimental data. The training and validation sets had r2 values of 0.91 and 0.81, respectively. [source]


Sinusoidal modeling applied to spatially variant tropospheric ozone air pollution

ENVIRONMETRICS, Issue 6 2008
Nicholas Z. Muller
Abstract This paper demonstrates how parsimonious models of sinusoidal functions can be used to fit spatially variant time series in which there is considerable variation of a periodic type. A typical shortcoming of such tools relates to the difficulty in capturing idiosyncratic variation in periodic models. The strategy developed here addresses this deficiency. While previous work has sought to overcome the shortcoming by augmenting sinusoids with other techniques, the present approach employs station-specific sinusoids to supplement a common regional component, which succeeds in capturing local idiosyncratic behavior in a parsimonious manner. The experiments conducted herein reveal that a semi-parametric approach enables such models to fit spatially varying time series with periodic behavior in a remarkably tight fashion. The methods are applied to a panel data set consisting of hourly air pollution measurements. The augmented sinusoidal models produce an excellent fit to these data at three different levels of spatial detail. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Applying climatically associated species pools to the modelling of compositional change in tropical montane forests

GLOBAL ECOLOGY, Issue 2 2008
Duncan J. Golicher
ABSTRACT Aim, Predictive species distribution modelling is a useful tool for extracting the maximum amount of information from biological collections and floristic inventories. However, in many tropical regions records are only available from a small number of sites. This can limit the application of predictive modelling, particularly in the case of rare and endangered species. We aim to address this problem by developing a methodology for defining and mapping species pools associated with climatic variables in order to investigate potential species turnover and regional species loss under climate change scenarios combined with anthropogenic disturbance. Location, The study covered an area of 6800 km2 in the highlands of Chiapas, southern Mexico. Methods, We derived climatically associated species pools from floristic inventory data using multivariate analysis combined with spatially explicit discriminant analysis. We then produced predictive maps of the distribution of tree species pools using data derived from 451 inventory plots. After validating the predictive power of potential distributions against an independent historical data set consisting of 3105 botanical collections, we investigated potential changes in the distribution of tree species resulting from forest disturbance and climate change. Results, Two species pools, associated with moist and cool climatic conditions, were identified as being particularly threatened by both climate change and ongoing anthropogenic disturbance. A change in climate consistent with low-emission scenarios of general circulation models was shown to be sufficient to cause major changes in equilibrium forest composition within 50 years. The same species pools were also found to be suffering the fastest current rates of deforestation and internal forest disturbance. Disturbance and deforestation, in combination with climate change, threaten the regional distributions of five tree species listed as endangered by the IUCN. These include the endemic species Magnolia sharpii Miranda and Wimmeria montana Lundell. Eleven vulnerable species and 34 species requiring late successional conditions for their regeneration could also be threatened. Main conclusions, Climatically associated species pools can be derived from floristic inventory data available for tropical regions using methods based on multivariate analysis even when data limitations prevent effective application of individual species modelling. Potential consequences of climate change and anthropogenic disturbance on the species diversity of montane tropical forests in our study region are clearly demonstrated by the method. [source]


Droughts and extreme events in regional daily Italian precipitation series

INTERNATIONAL JOURNAL OF CLIMATOLOGY, Issue 5 2002
Michele Brunetti
Abstract This paper proposes a methodology to study daily precipitation series that include a significant proportion of missing data, without resorting to completion methods based on randomly generated numbers. It is applied to a data set consisting of 75 station records (1951,2000) covering the Italian territory. They are clustered by principal component analysis into six regions: the north-west, the northern part of the north-east, the southern part of the north-east, the centre, the south and the islands (i.e. Sicily and Sardinia). Complete annual and seasonal regional average series are obtained from the incomplete station records, and analysed for droughts and extreme precipitation events. Droughts are identified by means of two indicators: the longest dry period and the proportion of dry days. The most remarkable result is a systematic increase in winter droughts over all of Italy, especially in the north, due mainly to the very dry 1987,93 period. Extreme events are analysed considering 5 day regional totals. In this case, however, an attempt to search for a statistically significant trend is not successful because of the scarcity of events in such a short period. The reliability of the regional series is checked by computing some basic statistics concerning total precipitation, rainy days and precipitation intensity and comparing them with the same statistics computed for regional series obtained by station records completed with methods based on random number generators. Copyright © 2002 Royal Meteorological Society. [source]


Nonlinear quantitative structure-property relationship modeling of skin permeation coefficient

JOURNAL OF PHARMACEUTICAL SCIENCES, Issue 11 2009
Brian J. Neely
Abstract The permeation coefficient characterizes the ability of a chemical to penetrate the dermis, and the current study describes our efforts to develop structure-based models for the permeation coefficient. Specifically, we have integrated nonlinear, quantitative structure-property relationship (QSPR) models, genetic algorithms (GAs), and neural networks to develop a reliable model. Case studies were conducted to investigate the effects of structural attributes on permeation using a carefully characterized database. Upon careful evaluation, a permeation coefficient data set consisting of 333 data points for 258 molecules was identified, and these data were added to our extensive thermophysical database. Of these data, permeation values for 160 molecular structures were deemed suitable for our modeling efforts. We employed established descriptors and constructed new descriptors to aid the development of a reliable QSPR model for the permeation coefficient. Overall, our new nonlinear QSPR model had an absolute-average percentage deviation, root-mean-square error, and correlation coefficient of 8.0%, 0.34, and 0.93, respectively. Cause-and-effect analysis of the structural descriptors obtained in this study indicates that that three size/shape and two polarity descriptors accounted for ,70% of the permeation information conveyed by the descriptors. © 2009 Wiley-Liss, Inc. and the American Pharmacists Association J Pharm Sci 98:4069,4084, 2009 [source]


A Comparison of Neural Network, Statistical Methods, and Variable Choice for Life Insurers' Financial Distress Prediction

JOURNAL OF RISK AND INSURANCE, Issue 3 2006
Patrick L. Brockett
This study examines the effect of the statistical/mathematical model selected and the variable set considered on the ability to identify financially troubled life insurers. Models considered are two artificial neural network methods (back-propagation and learning vector quantization (LVQ)) and two more standard statistical methods (multiple discriminant analysis and logistic regression analysis). The variable sets considered are the insurance regulatory information system (IRIS) variables, the financial analysis solvency tracking (FAST) variables, and Texas early warning information system (EWIS) variables, and a data set consisting of twenty-two variables selected by us in conjunction with the research staff at TDI and a review of the insolvency prediction literature. The results show that the back-propagation (BP) and LVQ outperform the traditional statistical approaches for all four variable sets with a consistent superiority across the two different evaluation criteria (total misclassification cost and resubstitution risk criteria), and that the twenty-two variables and the Texas EWIS variable sets are more efficient than the IRIS and the FAST variable sets for identification of financially troubled life insurers in most comparisons. [source]


Biplots of compositional data

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 4 2002
John Aitchison
Summary. The singular value decomposition and its interpretation as a linear biplot have proved to be a powerful tool for analysing many forms of multivariate data. Here we adapt biplot methodology to the specific case of compositional data consisting of positive vectors each of which is constrained to have unit sum. These relative variation biplots have properties relating to the special features of compositional data: the study of ratios, subcompositions and models of compositional relationships. The methodology is applied to a data set consisting of six-part colour compositions in 22 abstract paintings, showing how the singular value decomposition can achieve an accurate biplot of the colour ratios and how possible models interrelating the colours can be diagnosed. [source]


Measuring temporal variability in residential magnetic field exposures

BIOELECTROMAGNETICS, Issue 4 2001
W.T. Kaune
Abstract Considerable interest has developed during the past ten years regarding the hypothesis that living organisms may respond to temporal variability in ELF magnetic fields to which they are exposed. Consequently, methods to measure various aspects of temporal variability are of interest. In this paper, five measures of temporal variability were examined: Arithmetic means (Dmean) and rms values (Drms) of the first differences (i.e., absolute value of the difference between consecutive measurements) of magnetic field recordings; "standardized" forms of Drms, denoted RCMS, obtained by dividing Drms by the standard deviations of the magnetic field data; and mean (Fmean) and rms (Frms) values of fractional first differences. Theoretical investigations showed that Dmean and Drms are virtually unaffected by long-term systematic trends (changes) in exposure. These measures thus provide rather specific measures of short-term temporal variability. This was also true to a lesser extent for Fmean and Frms. In contrast, the RCMS metric was affected by both short-term and long-term exposure variabilities. The metrics were also investigated using a data set consisting of twice-repeated two-calendar-day recordings of bedroom magnetic fields and personal exposures of 203 women residing in the western portion of Washington State. The predominant source of short-term temporal variability in magnetic field exposures arose from the movement of subjects through spatially varying magnetic fields. Spearman correlations between TWA bedroom magnetic fields or TWA personal exposures and five measures of temporal variability were relatively low. Weak to moderate levels of correlation were observed between temporal variability measured during two different sessions separated in time by 3 or 6 months. We conclude that first difference and fractional difference metrics provide specific and fairly independent measures of short-term temporal variability. The RCMS metric does not provide an easily interpreted measure of short-term or long-term temporal variability. This last result raises uncertainties about the interpretation of published studies that use the RCMS metric. Bioelectromagnetics 22:232,245, 2001. © 2001 Wiley-Liss, Inc. [source]


Resonance Structures of the Amide Bond: The Advantages of Planarity

CHEMISTRY - A EUROPEAN JOURNAL, Issue 27 2006
Jon I. Mujika
Abstract Delocalization indexes based on magnitudes derived from electron-pair densities are demonstrated to be useful indicators of electron resonance in amides. These indexes, based on the integration of the two-electron density matrix over the atomic basins defined through the zero-flux condition, have been calculated for a series of amides at the B3LYP/6-31+G* level of theory. These quantities, which can be viewed as a measure of the sharing of electrons between atoms, behave in concordance with the traditional resonance model, even though they are integrated in Bader atomic basins. Thus, the use of these quantities overcomes contradictory results from analyses of atomic charges, yet keeps the theoretical appeal of using nonarbitrary atomic partitions and unambiguously defined functions such as densities and pair densities. Moreover, for a large data set consisting of 24 amides plus their corresponding rotational transition states, a linear relation was found between the rotational barrier for the amide and the delocalization index between the nitrogen and oxygen atoms, indicating that this parameter can be used as an ideal physical-chemical indicator of the electron resonance in amides. [source]


Cross-correlated and conventional dipolar carbon-13 relaxation in methylene groups in small, symmetric molecules

CONCEPTS IN MAGNETIC RESONANCE, Issue 2 2007
Leila Ghalebani
Abstract A theory for dipolar cross-correlated relaxation processes in AMX or AX2 spin system, with special reference to 13C-methylene groups, is reviewed briefly. Simple experiments and protocols for measuring the transfer rates between the carbon-13 Zeeman order and the three-spin order, and for their analogues in the transverse plane, are discussed using a concentrated solution of the disaccharide trehalose as a model system. Experimental data sets consisting of conventional carbon-13 relaxation parameters (T1, T2, and NOE), along with the cross-correlated relaxation rates, are also presented for some small, rigid, polycyclic molecules. These data are interpreted using spectral density functions appropriate to spherical or symmetric tops reorienting according to small-step rotational diffusion model. The analysis results in a consistent picture of the auto- and cross-correlated spin relaxation processes. © 2007 Wiley Periodicals, Inc. Concepts Magn Reson Part A 30A: 100,115, 2007. [source]


Generating Dichotomous Item Scores with the Four-Parameter Beta Compound Binomial Model

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 3 2007
Patrick O. Monahan
A Monte Carlo simulation technique for generating dichotomous item scores is presented that implements (a) a psychometric model with different explicit assumptions than traditional parametric item response theory (IRT) models, and (b) item characteristic curves without restrictive assumptions concerning mathematical form. The four-parameter beta compound-binomial (4PBCB) strong true score model (with two-term approximation to the compound binomial) is used to estimate and generate the true score distribution. The nonparametric item-true score step functions are estimated by classical item difficulties conditional on proportion-correct total score. The technique performed very well in replicating inter-item correlations, item statistics (point-biserial correlation coefficients and item proportion-correct difficulties), first four moments of total score distribution, and coefficient alpha of three real data sets consisting of educational achievement test scores. The technique replicated real data (including subsamples of differing proficiency) as well as the three-parameter logistic (3PL) IRT model (and much better than the 1PL model) and is therefore a promising alternative simulation technique. This 4PBCB technique may be particularly useful as a more neutral simulation procedure for comparing methods that use different IRT models. [source]


Fine-scale genetic structure and gene dispersal inferences in 10 Neotropical tree species

MOLECULAR ECOLOGY, Issue 2 2006
OLIVIER J. HARDY
Abstract The extent of gene dispersal is a fundamental factor of the population and evolutionary dynamics of tropical tree species, but directly monitoring seed and pollen movement is a difficult task. However, indirect estimates of historical gene dispersal can be obtained from the fine-scale spatial genetic structure of populations at drift,dispersal equilibrium. Using an approach that is based on the slope of the regression of pairwise kinship coefficients on spatial distance and estimates of the effective population density, we compare indirect gene dispersal estimates of sympatric populations of 10 tropical tree species. We re-analysed 26 data sets consisting of mapped allozyme, SSR (simple sequence repeat), RAPD (random amplified polymorphic DNA) or AFLP (amplified fragment length polymorphism) genotypes from two rainforest sites in French Guiana. Gene dispersal estimates were obtained for at least one marker in each species, although the estimation procedure failed under insufficient marker polymorphism, limited sample size, or inappropriate sampling area. Estimates generally suffered low precision and were affected by assumptions regarding the effective population density. Averaging estimates over data sets, the extent of gene dispersal ranged from 150 m to 1200 m according to species. Smaller gene dispersal estimates were obtained in species with heavy diaspores, which are presumably not well dispersed, and in populations with high local adult density. We suggest that limited seed dispersal could indirectly limit effective pollen dispersal by creating higher local tree densities, thereby increasing the positive correlation between pollen and seed dispersal distances. We discuss the potential and limitations of our indirect estimation procedure and suggest guidelines for future studies. [source]


Testing for linkage and Hardy-Weinberg disequilibrium

ANNALS OF HUMAN GENETICS, Issue 2 2009
E. Kulinskaya
Summary This paper concerns several important points when testing for Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) in genetics. First, we challenge the necessity of using exclusively two-sided tests for LD. Next, we show that the exact 2-sided tests based on the most popular measures of LD are not equivalent, and neither are the standard statistical tests even though the 1-sided tests are equivalent. We show how this results in different inference about LD for two data sets consisting of small groups of markers. Finally, we advocate the use of the conditional p-value for both LD and HWE testing. An important advantage of this p-value is that equivalent 1-sided tests are transformed into equivalent 2-sided tests. [source]


Developing a modern pollen,climate calibration data set for Norway

BOREAS, Issue 4 2010
ANNE E. BJUNE
Bjune, A. E., Birks, H. J. B., Peglar, S. M. & Odland, A. 2010: Developing a modern pollen,climate calibration data set for Norway. Boreas, Vol. 39, pp. 674,688. 10.1111/j.1502-3885.2010.00158.x. ISSN 0300-9483. Modern pollen,climate data sets consisting of modern pollen assemblages and modern climate data (mean July temperature and mean annual precipitation) have been developed for Norway based on 191 lakes and 321 lakes. The original 191-lake data set was designed to optimize the distribution of the lakes sampled along the mean July temperature gradient, thereby fulfilling one of the most critical assumptions of weighted-averaging regression and calibration and its relative, weighted-averaging partial least-squares regression. A further 130 surface samples of comparable taphonomy, taxonomic detail and analyst became available as a result of other projects. These 130 samples, all from new lakes, were added to the 191-lake data set to create the 321-lake data set. The collection and construction of these data sets are outlined. Numerical analyses involving generalized linear modelling, constrained ordination techniques, weighted-averaging partial least-squares regression, and two different cross-validation procedures are used to asses the effects of increasing the size of the calibration data set from 191 to 321 lakes. The two data sets are used to reconstruct mean July temperature and mean annual precipitation for a Holocene site in northwest Norway and a Lateglacial site in west-central Norway. Overall, little is to be gained by increasing the modern data set beyond about 200 lakes in terms of modern model performance statistics, but the down-core reconstructions show less between-sample variability and are thus potentially more plausible and realistic when based on the 321-lake data set. [source]