Home About us Contact

Large Datasets (large + dataset)

Distribution by Scientific Domains

Earth and Environmental Science	16%
Business, Economics, Finance and Accounting	16%
Medical Sciences	13%
Life Sciences	13%
Engineering	10%
Mathematics and Statistics	6%
Information Science and Computing	6%
Humanities and Social Sciences	6%
Chemistry	6%

Selected Abstracts

FORECASTING AUSTRALIAN MACROECONOMIC VARIABLES USING A LARGE DATASET

AUSTRALIAN ECONOMIC PAPERS, Issue 1 2010
SARANTIS TSIAPLIAS
This paper investigates the forecasting performance of the diffusion index approach for the Australian economy, and considers the forecasting performance of the diffusion index approach relative to composite forecasts. Weighted and unweighted factor forecasts are benchmarked against composite forecasts, and forecasts derived from individual forecasting models. The results suggest that diffusion index forecasts tend to improve on the benchmark AR forecasts. We also observe that weighted factors tend to produce better forecasts than their unweighted counterparts. We find, however, that the size of the forecasting improvement is less marked than previous research, with the diffusion index forecasts typically producing mean square errors of a similar magnitude to the VAR and BVAR approaches. [source]

Deficit in community species richness as explained by area and isolation of sites

DIVERSITY AND DISTRIBUTIONS, Issue 3 2000
Hans Henrik Bruun
Abstract .,The potential community species richness was predicted for 85 patches of seminatural grassland in an agricultural landscape in Denmark. The basis of the prediction was a very large dataset on the vegetation, soil pH and topography in Danish grasslands and related communities. Species were inserted into potential species pools according to their preferences regarding soil acidity and water availability (expressed as potential solar irradiation), and to the ranges in these two factors observed in each grassland patch. The difference between the predicted and the observed patch-level species richness, community richness deficit, varied considerably among patches. Community richness deficit exhibited a negative relationship with patch area, and for small patches a positive relationship with patch isolation. [source]

A minimum sample size required from Schmidt hammer measurements

EARTH SURFACE PROCESSES AND LANDFORMS, Issue 13 2009
Tomasz Niedzielski
Abstract The Schmidt hammer is a useful tool applied by geomorphologists to measure rock strength in field conditions. The essence of field application is to obtain a sufficiently large dataset of individual rebound values, which yields a meaningful numerical value of mean strength. Although there is general agreement that a certain minimum sample size is required to proceed with the statistics, the choice of size (i.e. number of individual impacts) was usually intuitive and arbitrary. In this paper we show a simple statistical method, based on the two-sample Student's t -test, to objectively estimate the minimum number of rebound measurements. We present the results as (1) the ,mean' and ,median' solutions, each providing a single estimate value, and (2) the empirical probability distribution of such estimates based on many field samples. Schmidt hammer data for 14 lithologies, 13,81 samples for each, with each sample consisting of 40 individual readings, have been evaluated, assuming different significance levels. The principal recommendations are: (1) the recommended minimum sample size for weak and moderately strong rock is 25; (2) a sample size of 15 is sufficient for sandstones and shales; (3) strong and coarse rocks require 30 readings at a site; (4) the minimum sample size may be reduced by one-third if the context of research allows for higher significance level for test statistics. Interpretations based on less than 10 readings from a site should definitely be avoided. Copyright © 2009 John Wiley & Sons, Ltd. [source]

Species,energy relationships and habitat complexity in bird communities

ECOLOGY LETTERS, Issue 8 2004
Allen H. Hurlbert
Abstract Species,energy theory is a commonly invoked theory predicting a positive relationship between species richness and available energy. The More Individuals Hypothesis (MIH) attempts to explain this pattern, and assumes that areas with greater food resources support more individuals, and that communities with more individuals include more species. Using a large dataset for North American birds, I tested these predictions of the MIH, and also examined the effect of habitat complexity on community structure. I found qualitative support for the relationships predicted by the MIH, however, the MIH alone was inadequate for fully explaining richness patterns. Communities in more productive sites had more individuals, but they also had more even relative abundance distributions such that a given number of individuals yielded a greater number of species. Richness and evenness were also higher in structurally complex forests compared to structurally more simple grasslands when controlling for available energy. [source]

An improved AMC-coupled runoff curve number model

HYDROLOGICAL PROCESSES, Issue 20 2010
Ram Kumar Sahu
Abstract In the Soil Conservation Service Curve Number (SCS-CN) method, the three levels of antecedent moisture condition (AMC) permit unreasonable sudden jumps in curve numbers, which result into corresponding jumps in the estimated runoff. A few recently developed SCS-CN-based models obviate this problem, yet they have several limitations. In this study, such a model incorporating a continuous function for antecedent moisture has been presented. It has several advantages over the other existing SCS-CN-based models. Its application to a large dataset from US watersheds showed to perform better than the existing SCS-CN method and the others based on it. Copyright © 2010 John Wiley & Sons, Ltd. [source]

Distribution-based anomaly detection in 3G mobile networks: from theory to practice

INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT, Issue 5 2010
Alessandro D'Alconzo
The design of anomaly detection (AD) methods for network traffic has been intensively investigated by the research community in recent years. However, less attention has been devoted to the issues which eventually arise when deploying such tools in a real operational context. We designed a statistical based change detection algorithm for identifying deviations in distribution time series. The proposed method has been applied to the analysis of a large dataset from an operational 3G mobile network, in the perspective of the adoption of such a tool in production. Our algorithm is designed to cope with the marked non-stationarity and daily/weekly seasonality that characterize the traffic mix in a large public network. Several practical issues emerged during the study, including the need to handle incompleteness of the collected data, the difficulty in drilling down the cause of certain alarms, and the need for human assistance in resetting the algorithm after a persistent change in network configuration (e.g. a capacity upgrade). We report on our practical experience, highlighting the key lessons learned and the hands-on experience gained from such an analysis. Finally, we propose a novel methodology based on semi-synthetic traces for tuning and performance assessment of the proposed AD algorithm. Copyright © 2010 John Wiley & Sons, Ltd. [source]

Duality revisited: Construction of fractional frequency distributions based on two dual Lotka laws

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 10 2002
L. Egghe
Fractional frequency distributions of, for example, authors with a certain (fractional) number of papers are very irregular and, therefore, not easy to model or to explain. This article gives a first attempt to this by assuming two simple Lotka laws (with exponent 2): one for the number of authors with n papers (total count here) and one for the number of papers with n authors, n , ,. Based on an earlier made convolution model of Egghe, interpreted and reworked now for discrete scores, we are able to produce theoretical fractional frequency distributions with only one parameter, which are in very close agreement with the practical ones as found in a large dataset produced earlier by Rao. The article also shows that (irregular) fractional frequency distributions are a consequence of Lotka's law, and are not examples of breakdowns of this famous historical law. [source]

The determinants of voluntary investment decisions

MANAGERIAL AND DECISION ECONOMICS, Issue 8 2001
Wendy Chapple
This paper analyses investments by firms into areas of corporate social responsibility, focussing on the decision by firms whether or not to invest in compliance with voluntary environmental standards. Theoretical predictions of the compliance decision are tested using discrete time survival analysis on a large dataset of UK manufacturing firms. The rate of voluntary compliance is found to have increased since the introduction of the International Standards Organization (ISO) scheme. Further, voluntary compliance is found to be negatively associated with rates of return and industry share, and positively associated with capital intensity and industry export intensity. In contrast to theoretical predictions on corporate social responsibility, there is no evidence that investment in intangible assets, either at the firm or the industry level, is positively associated with the compliance decision. Copyright © 2001 John Wiley & Sons, Ltd. [source]

The relative role of drift and selection in shaping the human skull

AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY, Issue 1 2010
Lia Betti
Abstract Human populations across the world vary greatly in cranial morphology. It is highly debated to what extent this variability has accumulated through neutral processes (genetic drift) or through natural selection driven by climate. By taking advantage of recent work showing that geographic distance along landmasses is an excellent proxy for neutral genetic differentiation, we quantify the relative role of drift versus selection in an exceptionally large dataset of human skulls. We show that neutral processes have been much more important than climate in shaping the human cranium. We further demonstrate that a large proportion of the signal for natural selection comes from populations from extremely cold regions. More generally, we show that, if drift is not explicitly accounted for, the effect of natural selection can be greatly overestimated. Am J Phys Anthropol, 2010. © 2009 Wiley-Liss, Inc. [source]

EWMA techniques for computer intrusion detection through anomalous changes in event intensity

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, Issue 6 2002
Nong Ye
Abstract Intrusion detection is used to monitor and capture intrusions into computer and network systems, which attempt to compromise the security of computer and network systems. To protect information systems from intrusions and thus assure the reliability and quality of service of information systems, it is highly desirable to develop techniques that detect intrusions into information systems. Many intrusions manifest in dramatic changes in the intensity of events occurring in information systems. Because of the ability of exponentially weighted moving average (EWMA) control charts to monitor the rate of occurrences of events based on the their intensity, we apply three EWMA statistics to detect anomalous changes in the events intensity for intrusion detections. They include the EWMA chart for autocorrelated data, the EWMA chart for uncorrelated data and the EWMA chart for monitoring the process standard deviation. The objectives of this paper are to provide design procedures for realizing these control charts and investigate their performance using different parameter settings based on one large dataset. The early detection capability of these EWMA techniques is also examined to provide the guidance about the design capacity of information systems. Copyright © 2002 John Wiley & Sons, Ltd. [source]

Observations on the detection of b- and y-type ions in the collisionally activated decomposition spectra of protonated peptides

RAPID COMMUNICATIONS IN MASS SPECTROMETRY, Issue 10 2009
King Wai Lau
Tandem mass spectrometric data from peptides are routinely used in an unsupervised manner to infer product ion sequence and hence the identity of their parent protein. However, significant variability in relative signal intensity of product ions within peptide tandem mass spectra is commonly observed. Furthermore, instrument-specific patterns of fragmentation are observed, even where a common mechanism of ion heating is responsible for generation of the product ions. This information is currently not fully exploited within database searching strategies; this motivated the present study to examine a large dataset of tandem mass spectra derived from multiple instrumental platforms. Here, we report marked global differences in the product ion spectra of protonated tryptic peptides generated from two of the most common proteomic platforms, namely tandem quadrupole-time-of-flight and quadrupole ion trap instruments. Specifically, quadrupole-time-of-flight tandem mass spectra show a significant under-representation of N-terminal b-type fragments in comparison to quadrupole ion trap product ion spectra. Energy-resolved mass spectrometry experiments conducted upon test tryptic peptides clarify this disparity; b-type ions are significantly less stable than their y-type N-terminal counterparts, which contain strongly basic residues. Secondary fragmentation processes which occur within the tandem quadrupole-time-of-flight device account for the observed differences, whereas this secondary product ion generation does not occur to a significant extent from resonant excitation performed within the quadrupole ion trap. We suggest that incorporation of this stability information in database searching strategies has the potential to significantly improve the veracity of peptide ion identifications as made by conventional database searching strategies. Copyright © 2009 John Wiley & Sons, Ltd. [source]

Air,sea exchanges in the equatorial area from the EQUALANT99 dataset: Bulk parametrizations of turbulent fluxes corrected for airflow distortion

THE QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, Issue 610 2005
A. Brut
Abstract Turbulent fluxes of momentum, sensible heat and water vapour were calculated using both the eddy covariance (EC) and the inertial dissipation (ID) methods applied to data collected on board the research vessel La Thalassa during 40 days of the EQUALANT99 oceanographic campaign. The aim of this experiment was to establish accurate parametrizations of air,sea fluxes for the equatorial Atlantic area from a large dataset. However, the accuracy of turbulent fluxes measured aboard ships is strongly affected by the distortion of airflow patterns generated by obstacles such as the ship and mast. For the EQUALANT99 experiment, the effects of airflow distortion were estimated using physical simulations in a water channel. To reproduce the conditions of the campaign, a neutral boundary layer was simulated in the water channel and a detailed model of the ship La Thalassa was built. Correction coefficients for the mean wind speed were evaluated from these physical simulations. They show a dependence on both the azimuth angle of the flow (i.e. the horizontal direction of the flow with respect to the ship's longitudinal axis) and the angle of incidence of the wind. The correction for airflow distortion was applied to the measured wind speed and also included in the flux computation using the ID method. Compared with earlier studies which applied a single correction per flux sample, it appears that our results for the corrected transfer coefficients present greater dependence on neutral wind speed than the previous parametrizations; the method also shows encouraging results, with a decrease in the scatter of the transfer coefficients parametrization. However, the distortion could not be corrected for in the fluxes calculated using the EC method, because this technique integrates a wide range of turbulence scales for which the airflow distortion cannot be simulated in a water channel. Fluxes computed using the ID and EC methods are presented and compared in order to determine which method, in the configuration of the EQUALANT99 experiment, provides the best resulting transfer coefficients. According to the results, fluxes of momentum and latent heat computed by ID were better for deriving the drag and humidity coefficients. The EC method seemed better adapted to calculate sensible-heat fluxes than the ID method, although a high scatter remained on the Stanton neutral number. Copyright © 2005 Royal Meteorological Society [source]

Origin, diversification and conservation status of talus snails in the Pinaleño Mountains: a conservation biogeographic study

ANIMAL CONSERVATION, Issue 3 2010
K. F. Weaver
Abstract For many taxa, determining conservation status and priority is impeded by lack of adequate taxonomic and range data. For these problematic groups, we propose combining molecular techniques with careful geographic sampling to evaluate the validity, extent and phylogenetic relatedness of the proposed units of diversity. We employed such a strategy to document monophyletic lineages, range extents and phylogenetic relatedness for talus snails (genus Sonorella) in the Pinaleño Mountains of Arizona, an isolated range that has the most vertical relief of any of the sky islands in Arizona. Three of the four species found in the Pinaleño Mountains have been considered candidate species for protection under the Endangered Species Act. Further, one of these taxa, Sonorella macrophallus, is of particular concern and was protected under an USFS conservation agreement until 2004, due to its presumed endemicity to a narrow portion of one canyon. We collected a large dataset of 12S and COI mitochondrial DNA, and subsamples of reproductive morphology from specimens collected throughout the Pinaleños and from adjacent ranges (e.g. the Huachucas, Chiricahuas and Santa Catalinas). We generated a phylogeny based on the mitochondrial data, and matched clades with named species utilizing reproductive morphology. Our results show that both S. macrophallus and Sonorella imitator are relatively widespread across the Pinaleños while Sonorella grahamensis and Sonorella christenseni are restricted to very small areas. These results dramatically change our previous knowledge about range extents, especially for S. macrophallus. Given these results, land managers may need to reassess the status of all four Sonorella species. Finally, all Sonorella species from the Pinaleños are more closely related to each other than to other taxa on other ranges. This result strongly suggests that diversification of the four Sonorella species in the Pinaleños occurred in situ. [source]

Air travel and the risk of deep vein thrombosis

AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH, Issue 1 2006
Niels G. Becker
Background:The magnitude of the risk of venous thromboembolism (VTE) following air travel has been difficult to resolve due to lack of adequate data. We determine the association more precisely by using a large dataset and an improved method of analysis. Method:Data on air-travel history for each of 5,196 patients hospitalised for VTE in Western Australia from 1981 to 1999 is analysed using a log-linear regression model for the probability that a flight triggers VTE and for the baseline hazard rate for VTE hospitalisation. Results:The risk of VTE being triggered on the day of an international flight relative to a flight-free day is 29.8 (95% CI 22.4-37.3). Evidence that this relative risk depends on age is weak (p=0.06), but the absolute risk clearly depends on age. The annual relative risk for an individual taking one international flight, compared with an individual of the same age taking no flight, is estimated to be 1.079. The estimated median time from flight to hospital admission is 4.7 days (95% CI 3.8-5.6) and the estimated 95th percentile is 13.3 (95% CI 10.3-16.8). Conclusions:Evidence for an association between international air travel and VTE hospitalisation is strong and passengers should be advised on ways to minimise risk during long flights. While 29.8 is a large relative risk, it must be remembered that the baseline risk is very small and the relative risk applies only to the unobserved triggering of a deep vein thrombosis episode on the day of travel; the consequent hospitalisation occurs on one of numerous ensuing days. [source]

Hierarchical Spatial Modeling of Additive and Dominance Genetic Variance for Large Spatial Trial Datasets

BIOMETRICS, Issue 2 2009
Andrew O. Finley
Summary This article expands upon recent interest in Bayesian hierarchical models in quantitative genetics by developing spatial process models for inference on additive and dominance genetic variance within the context of large spatially referenced trial datasets. Direct application of such models to large spatial datasets are, however, computationally infeasible because of cubic-order matrix algorithms involved in estimation. The situation is even worse in Markov chain Monte Carlo (MCMC) contexts where such computations are performed for several iterations. Here, we discuss approaches that help obviate these hurdles without sacrificing the richness in modeling. For genetic effects, we demonstrate how an initial spectral decomposition of the relationship matrices negate the expensive matrix inversions required in previously proposed MCMC methods. For spatial effects, we outline two approaches for circumventing the prohibitively expensive matrix decompositions: the first leverages analytical results from Ornstein,Uhlenbeck processes that yield computationally efficient tridiagonal structures, whereas the second derives a modified predictive process model from the original model by projecting its realizations to a lower-dimensional subspace, thereby reducing the computational burden. We illustrate the proposed methods using a synthetic dataset with additive, dominance, genetic effects and anisotropic spatial residuals, and a large dataset from a Scots pine (Pinus sylvestris L.) progeny study conducted in northern Sweden. Our approaches enable us to provide a comprehensive analysis of this large trial, which amply demonstrates that, in addition to violating basic assumptions of the linear model, ignoring spatial effects can result in downwardly biased measures of heritability. [source]

On the use of generalized linear models for interpreting climate variability

ENVIRONMETRICS, Issue 7 2005
Richard E. Chandler
Abstract Many topical questions in climate research can be reduced to either of two related problems: understanding how various different components of the climate system affect each other, and quantifying changes in the system. This article aims to justify the addition of generalized linear models to the climatologist's toolkit, by demonstrating that they offer an intuitive and flexible approach to such problems. In particular, we provide some suggestions as to how ,typical' climatological data structures may be represented within the GLM framework. Recurring themes include methods for space,time data and the need to cope with large datasets. The ideas are illustrated using a dataset of monthly U.S. temperatures. Copyright © 2005 John Wiley & Sons, Ltd. [source]

A Wet/Wet Differential Pressure Sensor for Measuring Vertical Hydraulic Gradient

GROUND WATER, Issue 1 2010
Brad G. Fritz
Vertical hydraulic gradient is commonly measured in rivers, lakes, and streams for studies of groundwater,surface water interaction. While a number of methods with subtle differences have been applied, these methods can generally be separated into two categories; measuring surface water elevation and pressure in the subsurface separately or making direct measurements of the head difference with a manometer. Making separate head measurements allows for the use of electronic pressure sensors, providing large datasets that are particularly useful when the vertical hydraulic gradient fluctuates over time. On the other hand, using a manometer-based method provides an easier and more rapid measurement with a simpler computation to calculate the vertical hydraulic gradient. In this study, we evaluated a wet/wet differential pressure sensor for use in measuring vertical hydraulic gradient. This approach combines the advantage of high-temporal frequency measurements obtained with instrumented piezometers with the simplicity and reduced potential for human-induced error obtained with a manometer board method. Our results showed that the wet/wet differential pressure sensor provided results comparable to more traditional methods, making it an acceptable method for future use. [source]

Data clustering as an optimum-path forest problem with applications in image analysis

INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, Issue 2 2009
Leonardo Marques Rocha
Abstract We propose an approach for data clustering based on optimum-path forest. The samples are taken as nodes of a graph, whose arcs are defined by an adjacency relation. The nodes are weighted by their probability density values (pdf) and a connectivity function is maximized, such that each maximum of the pdf becomes root of an optimum-path tree (cluster), composed by samples "more strongly connected" to that maximum than to any other root. We discuss the advantages over other pdf-based approaches and present extensions to large datasets with results for interactive image segmentation and for fast, accurate, and automatic brain tissue classification in magnetic resonance (MR) images. We also include experimental comparisons with other clustering approaches. © 2009 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 19, 50,68, 2009. [source]

Bayesian inference strategies for the prediction of genetic merit using threshold models with an application to calving ease scores in Italian Piemontese cattle

JOURNAL OF ANIMAL BREEDING AND GENETICS, Issue 4 2002
K. Kizilkaya
Summary First parity calving difficulty scores from Italian Piemontese cattle were analysed using a threshold mixed effects model. The model included the fixed effects of age of dam and sex of calf and their interaction and the random effects of sire, maternal grandsire, and herd-year-season. Covariances between sire and maternal grandsire effects were modelled using a numerator relationship matrix based on male ancestors. Field data consisted of 23 953 records collected between 1989 and 1998 from 4741 herd-year-seasons. Variance and covariance components were estimated using two alternative approximate marginal maximum likelihood (MML) methods, one based on expectation-maximization (EM) and the other based on Laplacian integration. Inferences were compared to those based on three separate runs or sequences of Markov Chain Monte Carlo (MCMC) sampling in order to assess the validity of approximate MML estimates derived from data with similar size and design structure. Point estimates of direct heritability were 0.24, 0.25 and 0.26 for EM, Laplacian and MCMC (posterior mean), respectively, whereas corresponding maternal heritability estimates were 0.10, 0.11 and 0.12, respectively. The covariance between additive direct and maternal effects was found to be not different from zero based on MCMC-derived confidence sets. The conventional joint modal estimates of sire effects and associated standard errors based on MML estimates of variance and covariance components differed little from the respective posterior means and standard deviations derived from MCMC. Therefore, there may be little need to pursue computation-intensive MCMC methods for inference on genetic parameters and genetic merits using conventional threshold sire and maternal grandsire models for large datasets on calving ease. Zusammenfassung Die Kalbeschwierigkeiten bei italienischen Piemonteser Erstkalbskühen wurden mittels eines gemischten Threshold Modells untersucht. Im Modell wurden die fixen Einflüsse vom Alter der Kuh und dem Geschlecht des Kalbes, der Interaktion zwischen beiden und die zufälligen Effekte des Großvaters der Mutter und der Herden-Jahr-Saisonklasse berücksichtigt. Die Kovarianz zwischen dem Vater der Kuh und dem Großvater der Mutter wurde über die nur auf väterlicher Verwandtschaft basierenden Verwandtschaftsmatrix berücksichtigt. Es wurden insgesamt 23953 Datensätze aus den Jahren 1989 bis 1998 von 4741 Herden-Jahr-Saisonklassen ausgewertet. Die Varianz- und Kovarianzkomponenten wurden mittels zweier verschiedener approximativer marginal Maximum Likelihood (MML) Methoden geschätzt, die erste basierend auf Expectation-Maximierung (EM) und die zweite auf Laplacian Integration. Rückschlüsse wurden verglichen mit solchen, basierend auf drei einzelne Läufe oder Sequenzen von Markov Chain Monte Carlo (MCMC) Stichproben, um die Gültigkeit der approximativen MML Schätzer aus Daten mit ähnlicher Größe und Struktur zu prüfen. Die Punktschätzer der direkten Heritabilität lagen bei 0,24; 0,25 und 0,26 für EM, Laplacian und MCMC (Posterior Mean), während die entsprechenden maternalen Heritabilitäten bei 0,10, 0,11 und 0,12 lagen. Die Kovarianz zwischen dem direkten additiven und dem maternalen Effekt wurden als nicht von Null verschieden geschätzt, basierend auf MCMC abgeleiteten Konfidenzintervallen. Die konventionellen Schätzer der Vatereffekte und deren Standardfehler aus den MML-Schätzungen der Varianz- und Kovarianzkomponenten differieren leicht von denen aus der MCMC Analyse. Daraus folgend besteht wenig Bedarf die rechenintensiven MCMC-Methoden anzuwenden, um genetische Parameter und den genetischen Erfolg zu schätzen, wenn konventionelle Threshold Modelle für große Datensätze mit Vätern und mütterlichen Großvätern mit Kalbeschwierigkeiten genutzt werden. [source]

Statistical Properties of the K -Index for Detecting Answer Copying

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 2 2002
Leonardo S. Sotaridona
We investigated the statistical properties of the K-index (Holland, 1996) that can be used to detect copying behavior on a test. A simulation study was conducted to investigate the applicability of the K-index for small, medium, and large datasets. Furthermore, the Type I error rate and the detection rate of this index were compared with the copying index, , (Wollack, 1997). Several approximations were used to calculate the K-index. Results showed that all approximations were able to hold the Type I error rates below the nominal level. Results further showed that using , resulted in higher detection rates than the K-indices for small and medium sample sizes (100 and 500 simulees). [source]

Ecological relevance of temporal stability in regional fish catches

JOURNAL OF FISH BIOLOGY, Issue 5 2003
H. Hinz
The concept of habitat selection based on ,Ideal Free Distribution' theory suggests that areas of high suitability may attract larger quantities of fishes than less suitable or unsuitable areas. Catch data were used from groundfish surveys to identify areas of consistently high densities of whiting Merlangius merlangus, cod Gadus morhua and haddock Melanogrammus aeglefinus in the Irish Sea and plaice Pleuronectes platessa, sole Solea solea, lemon sole Microstomus kitt in the English Channel over a period of 10 and 9 years respectively. A method was introduced to delineate areas of the seabed that held consistently high numbers of fishes objectively from large datasets. These areas may constitute important habitat characteristics which may merit further scientific investigations in respect to ,Essential Fish Habitats'(EFH). In addition, the number of stations with consistently high abundances of fishes and the number of stations where no fishes were caught gave an indication of the site specificity of the fish species analysed. For the gadoids, whiting was found to be less site specific than cod and haddock, while for the flatfishes, plaice and sole were less site specific than lemon sole. The findings are discussed in the context of previously published studies on dietary specializm. The site specificity of demersal fishes has implications for the siting process for marine protected areas as fish species with a strong habitat affinity can be expected to benefit more from such management schemes. [source]

Layers of nocturnal insect migrants at high-altitude: the influence of atmospheric conditions on their formation

AGRICULTURAL AND FOREST ENTOMOLOGY, Issue 1 2010
Curtis R. Wood
1Radar studies of nocturnal insect migration have often found that the migrants tend to form well-defined horizontal layers at a particular altitude. 2In previous short-term studies, nocturnal layers were usually observed to occur at the same altitude as certain meteorological features, most notably at the altitudes of temperature inversion tops or nocturnal wind jets. 3Statistical analyses are presented of 4 years of data that compared the presence, sharpness and duration of nocturnal layer profiles, observed using continuously-operating entomological radar, with meteorological variables at typical layer altitudes over the U.K. 4Analysis of these large datasets demonstrated that temperature was the foremost meteorological factor that was persistently associated with the presence and formation of longer-lasting and sharper layers of migrating insects over southern U.K. [source]

Short-term forecasting of GDP using large datasets: a pseudo real-time forecast evaluation exercise,

JOURNAL OF FORECASTING, Issue 7 2009
G. Rünstler
Abstract This paper performs a large-scale forecast evaluation exercise to assess the performance of different models for the short-term forecasting of GDP, resorting to large datasets from ten European countries. Several versions of factor models are considered and cross-country evidence is provided. The forecasting exercise is performed in a simulated real-time context, which takes account of publication lags in the individual series. In general, we find that factor models perform best and models that exploit monthly information outperform models that use purely quarterly data. However, the improvement over the simpler, quarterly models remains contained. Copyright © 2009 John Wiley & Sons, Ltd. [source]

Forecasting German GDP using alternative factor models based on large datasets

JOURNAL OF FORECASTING, Issue 4 2007
Christian Schumacher
Abstract This paper discusses the forecasting performance of alternative factor models based on a large panel of quarterly time series for the German economy. One model extracts factors by static principal components analysis; the second model is based on dynamic principal components obtained using frequency domain methods; the third model is based on subspace algorithms for state-space models. Out-of-sample forecasts show that the forecast errors of the factor models are on average smaller than the errors of a simple autoregressive benchmark model. Among the factor models, the dynamic principal component model and the subspace factor model outperform the static factor model in most cases in terms of mean-squared forecast error. However, the forecast performance depends crucially on the choice of appropriate information criteria for the auxiliary parameters of the models. In the case of misspecification, rankings of forecast performance can change severely.,,Copyright © 2007 John Wiley & Sons, Ltd. [source]

ESTIMATION AND HYPOTHESIS TESTING FOR NONPARAMETRIC HEDONIC HOUSE PRICE FUNCTIONS

JOURNAL OF REGIONAL SCIENCE, Issue 3 2010
Daniel P. McMillen
ABSTRACT In contrast to the rigid structure of standard parametric hedonic analysis, nonparametric estimators control for misspecified spatial effects while using highly flexible functional forms. Despite these advantages, nonparametric procedures are still not used extensively for spatial data analysis due to perceived difficulties associated with estimation and hypothesis testing. We demonstrate that nonparametric estimation is feasible for large datasets with many independent variables, offering statistical tests of individual covariates and tests of model specification. We show that fixed parameterization of distance to the nearest rapid transit line is a misspecification and that pricing of access to this amenity varies across neighborhoods within Chicago. [source]

Quick prediction of the retention of solutes in 13 thin layer chromatographic screening systems on silica gel by classification and regression trees

JOURNAL OF SEPARATION SCIENCE, JSS, Issue 15 2008
ukasz Komsta
Abstract The use of classification and regression trees (CART) was studied in a quantitative structure,retention relationship (QSRR) context to predict the retention in 13 thin layer chromatographic screening systems on a silica gel, where large datasets of interlaboratory determined retention are available. The response (dependent variable) was the rate mobility (RM) factor, while a set of atomic contributions and functional substituent counts was used as an explanatory dataset. The trees were investigated against optimal complexity (number of the leaves) by external validation and internal crossvalidation. Their predictive performance is slightly lower than full atomic contribution model, but the main advantage is the simplicity. The retention prediction with the proposed trees can be done without computer or even pocket calculator. [source]

Use of the ecological information system SynBioSys for the analysis of large datasets

JOURNAL OF VEGETATION SCIENCE, Issue 4 2007
Joop H.J. Schaminée
Abstract The rapid developments in computer techniques and the availability of large datasets open new perspectives for vegetation analysis aiming at better understanding of the ecology and functioning of ecosystems and underlying mechanisms. Information systems prove to be helpful tools in this new field. Such information systems may integrate different biological levels, viz. species, community and landscape. They incorporate a GIS platform for the visualization of the various layers of information, enabling the analysis of patterns and processes which relate the individual levels. An example of a newly developed information system is SynBioSys Europe, an initiative of the European Vegetation Survey (EVS). For the individual levels of the system, specific sources are available, notably national and regional Turboveg databases for the community level and data from the recently published European Map of Natural Vegetation for the landscape level. The structure of the system and its underlying databases allow user-defined queries. With regard to its application, such information systems may play a vital role in European nature planning, such as the implementation the EU-program Natura 2000. To illustrate the scope and perspectives of the program, some examples from The Netherlands are presented. They are dealing with long-term changes in grassland ecosystems, including shifts in distribution, floristic composition, and ecological indicator values. [source]

Pilot Study Examining the Utility of Microarray Data to Identify Genes Associated with Weight in Transplant Recipients

NURSING & HEALTH SCIENCES, Issue 2 2006
Ann Cashion
Purpose/Methods:, Obesity, a complex, polygenic disorder and a growing epidemic in transplant recipients, is a risk factor for chronic diseases. This secondary data analysis identified if microarray technologies and bioinformatics could find differences in gene expression profiles between liver transplant recipients with low Body Mass Index (BMI < 29; n = 5) vs. high (BMI > 29; n = 7). Blood was hybridized on Human U133 Plus 2 GeneChip (Affymetrix) and analyzed using GeneSpring Software. Results:, Groups were similar in age and race, but not gender. Expression levels of 852 genes were different between the low and high BMI groups (P < 0.05). The majority (562) of the changes associated with high BMI were decreases in transcript levels. Among the 852 genes associated with BMI, 263 and 14 genes were affected greater than 2- or 5-fold, respectively. Following functionally classification using Gene Ontology (GO), we found that 19 genes (P < 0.00008) belonged to defense response and 15 genes (P < 0.00006) belonged to immune response. Conclusion:, These data could point the way toward therapeutic interventions and identify those at-risk. These results demonstrate that we can (1) extract high quality RNA from immunosuppressed patients; (2) manage large datasets and perform statistical and functional analysis. [source]

The World Wide Web and the U.S. Political News Market

AMERICAN JOURNAL OF POLITICAL SCIENCE, Issue 2 2010
Norman H. Nie
We propose a framework for understanding how the Internet has affected the U.S. political news market. The framework is driven by the lower cost of production for online news and consumers' tendency to seek out media that conform to their own beliefs. The framework predicts that consumers of Internet news sources should hold more extreme political views and be interested in more diverse political issues than those who solely consume mainstream television news. We test these predictions using two large datasets with questions about news exposure and political views. Generally speaking, we find that consumers of generally left-of-center (right-of-center) cable news sources who combine their cable news viewing with online sources are more liberal (conservative) than those who do not. We also find that those who use online news content are more likely than those who consume only television news content to be interested in niche political issues. [source]