Data Generation (data + generation)

Distribution by Scientific Domains


Selected Abstracts


DYNAMIC SEARCH SPACE TRANSFORMATIONS FOR SOFTWARE TEST DATA GENERATION

COMPUTATIONAL INTELLIGENCE, Issue 1 2008
Ramón Sagarna
Among the tasks in software testing, test data generation is particularly difficult and costly. In recent years, several approaches that use metaheuristic search techniques to automatically obtain the test inputs have been proposed. Although work in this field is very active, little attention has been paid to the selection of an appropriate search space. The present work describes an alternative to this issue. More precisely, two approaches which employ an Estimation of Distribution Algorithm as the metaheuristic technique are explained. In both cases, different regions are considered in the search for the test inputs. Moreover, to depart from a region near to the one containing the optimum, the definition of the initial search space incorporates static information extracted from the source code of the software under test. If this information is not enough to complete the definition, then a grid search method is used. According to the results of the experiments conducted, it is concluded that this is a promising option that can be used to enhance the test data generation process. [source]


The Impact of Training Intensity on Establishment Productivity

INDUSTRIAL RELATIONS, Issue 1 2006
Article first published online: 23 DEC 200, THOMAS ZWICK
The empirical literature on productivity effects of continuing training is constantly increasing. However, the results on this subject differ widely. Explanations for this worrying diversity seem to lie in differences between countries, labor market institutions, and data generation on one hand, and in differences between the underlying estimation techniques on the other (Bartel, 2000). This paper concentrates on the latter problem and shows how results vary with different estimation techniques. [source]


The ethics of research using electronic mail discussion groups

JOURNAL OF ADVANCED NURSING, Issue 5 2005
Debbie Kralik PhD RN
Aim., The aim of this paper is to identify and discuss the ethical considerations that have confronted and challenged the research team when researchers facilitate conversations using private electronic mail discussion lists. Background., The use of electronic mail group conversations, as a collaborative data generation method, remains underdeveloped in nursing. Ethical challenges associated with this approach to data generation have only begun to be considered. As receipt of ethics approval for a study titled; ,Describing transition with people who live with chronic illness' we have been challenged by many ethical dilemmas, hence we believe it is timely to share the issues that have confronted the research team. These discussions are essential so we can understand the possibilities for research interaction, communication, and collaboration made possible by advanced information technologies. Discussion., Our experiences in this study have increased our awareness for ongoing ethical discussions about privacy, confidentiality, consent, accountability and openness underpinning research with human participants when generating data using an electronic mail discussion group. We describe how we work at upholding these ethical principles focusing on informed consent, participant confidentiality and privacy, the participants as threats to themselves and one another, public,private confusion, employees with access, hackers and threats from the researchers. Conclusion., A variety of complex issues arise during cyberspace research that can make the application of traditional ethical standards troublesome. Communication in cyberspace alters the temporal, spatial and sensory components of human interaction, thereby challenging traditional ethical definitions and calling to question some basic assumptions about identity and ones right to keep aspects of it confidential. Nurse researchers are bound by human research ethics protocols; however, the nature of research by electronic mail generates moral issues as well as ethical concerns. Vigilance by researchers is required to ensure that data are viewed within the scope of the enabling ethics approval. [source]


Identifiability of parameters and behaviour of MCMC chains: a case study using the reaction norm model

JOURNAL OF ANIMAL BREEDING AND GENETICS, Issue 2 2009
M.M. Shariati
Summary Markov chain Monte Carlo (MCMC) enables fitting complex hierarchical models that may adequately reflect the process of data generation. Some of these models may contain more parameters than can be uniquely inferred from the distribution of the data, causing non-identifiability. The reaction norm model with unknown covariates (RNUC) is a model in which unknown environmental effects can be inferred jointly with the remaining parameters. The problem of identifiability of parameters at the level of the likelihood and the associated behaviour of MCMC chains were discussed using the RNUC as an example. It was shown theoretically that when environmental effects (covariates) are considered as random effects, estimable functions of the fixed effects, (co)variance components and genetic effects are identifiable as well as the environmental effects. When the environmental effects are treated as fixed and there are other fixed factors in the model, the contrasts involving environmental effects, the variance of environmental sensitivities (genetic slopes) and the residual variance are the only identifiable parameters. These different identifiability scenarios were generated by changing the formulation of the model and the structure of the data and the models were then implemented via MCMC. The output of MCMC sampling schemes was interpreted in the light of the theoretical findings. The erratic behaviour of the MCMC chains was shown to be associated with identifiability problems in the likelihood, despite propriety of posterior distributions, achieved by arbitrarily chosen uniform (bounded) priors. In some cases, very long chains were needed before the pattern of behaviour of the chain may signal the existence of problems. The paper serves as a warning concerning the implementation of complex models where identifiability problems can be difficult to detect a priori. We conclude that it would be good practice to experiment with a proposed model and to understand its features before embarking on a full MCMC implementation. [source]


Modeling and predictive control using fuzzy logic: Application for a polymerization system

AICHE JOURNAL, Issue 4 2010
Nádson M. N. Lima
Abstract In this study, a predictive control system based on type Takagi-Sugeno fuzzy models was developed for a polymerization process. Such processes typically have a highly nonlinear dynamic behavior causing the performance of controllers based on conventional internal models to be poor or to require considerable effort in controller tuning. The copolymerization of methyl methacrylate with vinyl acetate was considered for analysis of the performance of the proposed control system. A nonlinear mathematical model which describes the reaction plant was used for data generation and implementation of the controller. The modeling using the fuzzy approach showed an excellent capacity for output prediction as a function of dynamic data input. The performance of the projected control system and dynamic matrix control for regulatory and servo problems were compared and the obtained results showed that the control system design is robust, of simple implementation and provides a better response than conventional predictive control. © 2009 American Institute of Chemical Engineers AIChE J, 2010 [source]


Snap-shots of live theatre: the use of photography to research governance in operating room nursing

NURSING INQUIRY, Issue 2 2003
Robin Riley
Snap-shots of live theatre: the use of photography to research governance in operating room nursing The use of photography is an underreported method of research in the nursing literature. This paper explores its use in an ethnographic research project, the fieldwork of which was undertaken by the first author. The aim was to examine the governance of operating room nursing in the clinical setting and the theoretical orientation was the work of Michel Foucault. The focus of this paper is on how photography was used as a means of data generation. To establish some context we begin by drawing on writers from sociology and anthropology to provide an overview of the status of vision and visual research methods in contemporary social research. We then move to a brief discussion of the uses of photography in social research and the limitations imposed by ethical considerations of its use in clinical nursing settings. As well, the process and approach involved in this research project, and issues of analysis are discussed. Three ,snap-shots' of operating room nursing, taken by participants, are presented. Each is analysed in terms of its contributions to the research process as well as its substantive contribution to the theoretical framework and the research aims. [source]


Construction of a ,unigene' cDNA clone set by oligonucleotide fingerprinting allows access to 25 000 potential sugar beet genes

THE PLANT JOURNAL, Issue 5 2002
Ralf Herwig
Summary Access to the complete gene inventory of an organism is crucial to understanding physiological processes like development, differentiation, pathogenesis, or adaptation to the environment. Transcripts from many active genes are present at low copy numbers. Therefore, procedures that rely on random EST sequencing or on normalisation and subtraction methods have to produce massively redundant data to get access to low-abundance genes. Here, we present an improved oligonucleotide fingerprinting (ofp) approach to the genome of sugar beet (Beta vulgaris), a plant for which practically no molecular information has been available. To identify distinct genes and to provide a representative ,unigene' cDNA set for sugar beet, 159 936 cDNA clones were processed utilizing large-scale, high-throughput data generation and analysis methods. Data analysis yielded 30 444 ofp clusters reflecting the number of different genes in the original cDNA sample. A sample of 10 961 cDNA clones, each representing a different cluster, were selected for sequencing. Standard sequence analysis confirmed that 89% of these EST sequences did represent different genes. These results indicate that the full set of 30 444 ofp clusters represent up to 25 000 genes. We conclude that the ofp analysis pipeline is an accurate and effective way to construct large representative ,unigene' sets for any plant of interest with no requirement for prior molecular sequence data. [source]


The sheep genome reference sequence: a work in progress

ANIMAL GENETICS, Issue 5 2010
The International Sheep Genomics Consortium
Summary Until recently, the construction of a reference genome was performed using Sanger sequencing alone. The emergence of next-generation sequencing platforms now means reference genomes may incorporate sequence data generated from a range of sequencing platforms, each of which have different read length, systematic biases and mate-pair characteristics. The objective of this review is to inform the mammalian genomics community about the experimental strategy being pursued by the International Sheep Genomics Consortium (ISGC) to construct the draft reference genome of sheep (Ovis aries). Component activities such as data generation, sequence assembly and annotation are described, along with information concerning the key researchers performing the work. This aims to foster future participation from across the research community through the coordinated activities of the consortium. The review also serves as a ,marker paper' by providing information concerning the pre-publication release of the reference genome. This ensures the ISGC adheres to the framework for data sharing established at the recent Toronto International Data Release Workshop and provides guidelines for data users. [source]


DYNAMIC SEARCH SPACE TRANSFORMATIONS FOR SOFTWARE TEST DATA GENERATION

COMPUTATIONAL INTELLIGENCE, Issue 1 2008
Ramón Sagarna
Among the tasks in software testing, test data generation is particularly difficult and costly. In recent years, several approaches that use metaheuristic search techniques to automatically obtain the test inputs have been proposed. Although work in this field is very active, little attention has been paid to the selection of an appropriate search space. The present work describes an alternative to this issue. More precisely, two approaches which employ an Estimation of Distribution Algorithm as the metaheuristic technique are explained. In both cases, different regions are considered in the search for the test inputs. Moreover, to depart from a region near to the one containing the optimum, the definition of the initial search space incorporates static information extracted from the source code of the software under test. If this information is not enough to complete the definition, then a grid search method is used. According to the results of the experiments conducted, it is concluded that this is a promising option that can be used to enhance the test data generation process. [source]