Data Matrix (data + matrix)

Distribution by Scientific Domains
Distribution within Life Sciences

Kinds of Data Matrix

  • morphological data matrix


  • Selected Abstracts


    Ultrastructural study of spermiogenesis in the Jamaican Gray Anole, Anolis lineatopus (Reptilia: Polychrotidae)

    ACTA ZOOLOGICA, Issue 4 2010
    Justin L. Rheubert
    Rheubert, J.L., Wilson, B.S., Wolf, K.W. and Gribbins, K.M. 2010. Ultrastructural study of spermiogenesis in the Jamaican Gray Anole, Anolis lineatopus (Reptilia: Polychrotidae). ,Acta Zoologica (Stockholm) 91: 484,494. Abstract As the number of spermatozoal characters being described in reptiles increases, it is important to detail the ontogeny of the features leading to the mature morphology of the spermatozoa which may give rise to more comprehensive data matrices for future phylogenetic analyses within the Reptilia. Therefore, spermiogenically active testes from Anolis lineatopus were investigated ultrastructurally to describe the intracellular changes that occur throughout spermiogenesis. The primary events of spermiogenesis (acrosome formation, nuclear condensation, and elongation) seen in A. lineatopus are similar to those previously described for other amniotes. Characters including a round perforatorium tip, stopper-like perforatorial base plate, open pits of nucleoplasm during condensation, and protein layers within the acrosome complex corroborate trends from previous studies in squamates. However, uniquely defined in A. lineatopus are the excessive amounts of endoplasmic reticulum and Golgi complexes that contribute to cellular secretions during mid elongation of the spermatids and the lack of a manchette. During acrosome formation, the acrosome granule is found in a basal rather than an apical position, which has been observed in previous studies. These similarities and differences observed during spermiogenesis may be helpful in elucidating the development of mature spermatozoal characters as well as aid in future phylogenetic analyses. [source]


    Application of Multivariate curve resolution-alternating least square methods on the resolution of overlapping CE peaks from different separation conditions

    ELECTROPHORESIS, Issue 20 2007
    Fang Zhang
    Abstract Discussed in this paper is the development of a new strategy to improve resolution of overlapping CE peaks by using second-order multivariate curve resolution with alternating least square (second-order MCR-ALS) methods. Several kinds of organic reagents are added, respectively, in buffers and sets of overlapping peaks with different separations are obtained. Augmented matrix is formed by the corresponding matrices of the overlapping peaks and is then analyzed by the second-order MCR-ALS method in order to use all data information to improve the precision of the resolution. Similarity between the resolved unit spectrum and the true one is used to assess the quality of the solutions provided by the above method. 3,4-Dihydropyrimidin-2-one derivatives (DHPOs) are used as model components and mixed artificially in order to obtain overlapping peaks. Three different impurity levels, 100, 20, and 10% relative to the main component, are used. With this strategy, the concentration profiles and spectra of impurities, which are no more than 10% of the main component, can be resolved from the overlapping peaks without pure standards participant in the analysis. The effects of the changes in the components spectra in the buffer with different organic reagents on the resolution are also evaluated, which are slight and can thus be ignored in the analysis. Individual data matrices (two-way data) are also analyzed by using MCR-ALS and heuristic evolving latent projections (HELP) methods and their results are compared with those when MCR-ALS is applied to augmented data matrix (three-way data) analysis. [source]


    Site scores and conditional biplots in canonical correspondence analysis

    ENVIRONMETRICS, Issue 1 2004
    Jan Graffelman
    Abstract Canonical correspondence analysis is an important multivariate technique in community ecology. It produces an interesting biplot that summarizes the data matrices involved in the analysis. The method produces two sets of site scores that can be used in a biplot. One set concerns site scores that are weighted averages of the species scores (WA scores), and the other set represents site scores that are linear combinations of the environmental variables (LC scores). We show that the use of both sets of scores in a CCA biplot can be justified. The use of the WA scores leads to the best possible representation of the species data conditional on the representation of the weighted averages. Likewise, the LC scores lead to the best possible representation of the environmental variables, also conditional on the representation of the weighted averages and on the use of a Mahalanobis metric. The eigenvalues obtained in CCA indicate how well the species data are represented when LC scores are used. The quality of representation of the species data when WA scores are used can be computed from the CCA eigenvalues and the variances of the WA scores. Scalar products between WA scores and environmental variable vectors do not form a biplot of the environmental data. Theoretical results are illustrated with Australian data from freshwater ecology. Copyright © 2003 John Wiley & Sons, Ltd. [source]


    Enhancing molecular discovery using descriptor-free rearrangement clustering techniques for sparse data sets

    AICHE JOURNAL, Issue 2 2010
    Peter A. DiMaggio Jr.
    Abstract This article presents a descriptor-free method for estimating library compounds with desired properties from synthesizing and assaying minimal library space. The method works by identifying the optimal substituent ordering (i.e., the optimal encoding integer assignment to each functional group on every substituent site of molecular scaffold) based on a global pairwise difference metric intended to capture smoothness of the compound library. The reordering can be accomplished via a (i) mixed-integer linear programming (MILP) model, (ii) genetic algorithm based approach, or (iii) heuristic approach. We present performance comparisons between these techniques as well as an independent analysis of characteristics of the MILP model. Two sparsely sampled data matrices provided by Pfizer are analyzed to validate the proposed approach and we show that the rearrangement of these matrices leads to regular property landscapes which enable reliable property estimation/interpolation over the full library space. An iterative strategy for compound synthesis is also introduced that utilizes the results of the reordered data to direct the synthesis toward desirable compounds. We demonstrate in a simulated experiment using held out subsets of the data that the proposed iterative technique is effective in identifying compounds with desired physical properties. © 2009 American Institute of Chemical Engineers AIChE J, 2010 [source]


    Ultrastructure of spermiogenesis in the Cottonmouth, Agkistrodon piscivorus (Squamata: Viperidae: Crotalinae)

    JOURNAL OF MORPHOLOGY, Issue 3 2010
    Kevin M. Gribbins
    Abstract To date multiple studies exist that examine the morphology of spermatozoa. However, there are limited numbers of data detailing the ontogenic characters of spermiogenesis within squamates. Testicular tissues were collected from Cottonmouths (Agkistrodon piscivorus) and tissues from spermiogenically active months were analyzed ultrastructurally to detail the cellular changes that occur during spermiogenesis. The major events of spermiogenesis (acrosome formation, nuclear elongation/DNA condensation, and flagellar development) resemble that of other squamates; however, specific ultrastructural differences can be observed between Cottonmouths and other squamates studied to date. During acrosome formation vesicles from the Golgi apparatus fuse at the apical surface of the nuclear membrane prior to making nuclear contact. At this stage, the acrosome granule can be observed in a centralized location within the vesicle. As elongation commences the acrosome complex becomes highly compartmentalized and migrates laterally along the nucleus. Parallel and circum-cylindrical microtubules (components of the manchette) are observed with parallel microtubules outnumbering the circum-cylindrical microtubules. Flagella, displaying the conserved 9 + 2 microtubule arrangement, sit in nuclear fossae that have electron lucent shoulders juxtaposed on either side of the spermatids basal plates. This study aims to provide developmental characters for squamates in the subfamily Crotalinae, family Viperidae, which may be useful for histopathological studies on spermatogenesis in semi-aquatic species exposed to pesticides. Furthermore, these data in the near future may provide morphological characters for spermiogenesis that can be added to morphological data matrices that may be used in phylogenetic analyses. J. Morphol. 2010. © 2009 Wiley-Liss, Inc. [source]


    Physical characterization of component particles included in dry powder inhalers.

    JOURNAL OF PHARMACEUTICAL SCIENCES, Issue 5 2007

    Abstract Characteristics of particles included in dry powder inhalers is extended from our previous report (in this journal) to include properties related to their dynamic performance. The performance of dry powder aerosols for pulmonary delivery is known to depend on fluidization and dispersion which reflects particle interactions in static powder beds. Since the solid state, surface/interfacial chemistry and static bulk properties were assessed previously, it remains to describe dynamic performance with a view to interpreting the integrated database. These studies result in complex data matrices from which correlations between specific properties and performance may be deduced. Lactose particles were characterized in terms of their dynamic flow, powder and aerosol electrostatics, and aerodynamic performance with respect to albuterol aerosol dispersion. There were clear correlations between flow properties and aerosol dispersion that would allow selection of lactose particles for formulation. Moreover, these properties can be related to data reported earlier on the morphological and surface properties of the carrier lactose particles. The proposed series of analytical approaches to the evaluation of powders for inclusion in aerosol products has merit and may be the basis for screening and ultimately predicting particle performance with a view to formulation optimization. © 2007 Wiley-Liss, Inc. and the American Pharmacists Association J Pharm Sci 96: 1302,1319, 2007 [source]


    An improved method for searching plant functional types by numerical analysis

    JOURNAL OF VEGETATION SCIENCE, Issue 3 2003
    Valério DePatta Pillar
    Abstract. The use of plant functional types (PFTs) to describe patterns and processes in plant communities has become essential to study and predict consequences of global change on vegetation and ecosystem processes. A PFT is a group of plants that, irrespective of phylogeny, are similar in a given set of traits and similar in their association to certain variables, which may be factors to which the plants are responding or effects of the plants in the ecosystem. To define PFTs relevant traits must be selected and an appropriate method must be used to classify plants into types. We critically review methods used for the analysis of PFT-based data and describe a new recursive algorithm to numerically search for traits and find optimal PFTs. The algorithm uses three data matrices: describing populations by traits, communities by these populations and community sites by environmental factors or effects. It defines PFTs polythetically by cluster analysis, revealing plant types whose performance in communities is maximally associated to the specified environmental variables. We test the method with data from natural grassland communities of southern Brazil, which were experimentally subjected to combinations of grazing levels and N-fertilizer. The new method is found to be better than similar analytical procedures previously described. Redundancy among traits is discussed and a procedure for comparing alternative solutions is presented based on the similarity in terms of PFT responses between different trait subsets. The concept of PFT response group is illustrated by example. [source]


    The role of character loss in phylogenetic reconstruction as exemplified for the Annelida

    JOURNAL OF ZOOLOGICAL SYSTEMATICS AND EVOLUTIONARY RESEARCH, Issue 4 2007
    C. Bleidorn
    Abstract Annelid relationships are controversial, and molecular and morphological analyses provide incongruent estimates. Character loss is identified as a major confounding factor for phylogenetic analyses based on morphological data. A direct approach and an indirect approach for the identification of character loss are discussed. Character loss can frequently be found within annelids and examples of the loss of typical annelid characters, like chaetae, nuchal organs, coelomic cavities and other features, are given. A loss of segmentation is suggested for Sipuncula and Echiura; both are supported as annelid ingroups in molecular phylogenetic analyses. Moreover, character loss can be caused by some modes of heterochronic evolution (paedomorphosis) and, as shown for orbiniid and arenicolid polychaetes, paedomorphic taxa might be misplaced in phylogenies derived from morphology. Different approaches for dealing with character loss in cladistic analyses are discussed. Application of asymmetrical character state transformation costs or usage of a dynamic homology framework represents promising approaches. Identifying character loss prior to a phylogenetic analysis will help to refine morphological data matrices and improve phylogenetic analyses of annelid relationships. Zusammenfassung Die Phylogenie der Annelida wird nach wie vor kontrovers diskutiert und morphologische und molekulare Analysen liefern hierbei unterschiedliche Ergebnisse. Merkmalsverluste können phylogenetische Analysen morphologischer Daten in die Irre führen. In der vorliegenden Arbeit werden ein direkter und ein indirekter Ansatz zur Erkennung von Merkmalsverlusten vorgestellt. Es wird gezeigt, dass Merkmalsverlust innerhalb der Anneliden häufig auftritt und das hiervon auch typische Annelidenmerkmale, wie z.B Borsten, Nuchalorgane oder Coelomräume betroffen seien können. Molekularphylogenetische Analysen unterstützen eine Stellung der Echiura und Sipuncula innerhalb der Anneliden und somit ist für diese Taxa ein Verlust der Segmentierung anzunehmen. Es wird demonstriert, dass Merkmalsverlust durch herterochrone Evolution verursacht werden kann. Am Beispiel von Orbiniiden und Arenicoliden wird gezeigt, wie paedomorphe Taxa in kladistischen Analysen morphologischer Daten falsch platziert werden. Verschiedene Ansätze zum Umgang mit Merkmalsverlust in morphologischen Datensätzen werden präsentiert und diskutiert. Hierbei stellen die Verwendung asymmetrischer Merkmalstransformationskosten oder die Verwendung dynamischer Homologiehypothesen aussichtsreiche Ansätze dar. Jedoch werden für alle Ansätze Phylogeniehypothesen benötigt, die in einer Analyse unabhängiger Daten (bspw. Moleküle) erstellt wurden, um Merkmalsverluste sicher zu identifizieren. [source]


    Phylogeny and Systematic Position of Opiliones: A Combined Analysis of Chelicerate Relationships Using Morphological and Molecular Data,

    CLADISTICS, Issue 1 2002
    Gonzalo Giribet
    The ordinal level phylogeny of the Arachnida and the suprafamilial level phylogeny of the Opiliones were studied on the basis of a combined analysis of 253 morphological characters, the complete sequence of the 18S rRNA gene, and the D3 region of the 28S rRNA gene. Molecular data were collected for 63 terminal taxa. Morphological data were collected for 35 exemplar taxa of Opiliones, but groundplans were applied to some of the remaining chelicerate groups. Six extinct terminals, including Paleozoic scorpions, are scored for morphological characters. The data were analyzed using strict parsimony for the morphological data matrix and via direct optimization for the molecular and combined data matrices. A sensitivity analysis of 15 parameter sets was undertaken, and character congruence was used as the optimality criterion to choose among competing hypotheses. The results obtained are unstable for the high-level chelicerate relationships (except for Tetrapulmonata, Pedipalpi, and Camarostomata), and the sister group of the Opiliones is not clearly established, although the monophyly of Dromopoda is supported under many parameter sets. However, the internal phylogeny of the Opiliones is robust to parameter choice and allows the discarding of previous hypotheses of opilionid phylogeny such as the "Cyphopalpatores" or "Palpatores." The topology obtained is congruent with the previous hypothesis of "Palpatores" paraphyly as follows: (Cyphophthalmi (Eupnoi (Dyspnoi + Laniatores))). Resolution within the Eupnoi, Dyspnoi, and Laniatores (the latter two united as Dyspnolaniatores nov.) is also stable to the superfamily level, permitting a new classification system for the Opiliones. [source]


    Criminal cognitions and personality: what does the PICTS really measure?

    CRIMINAL BEHAVIOUR AND MENTAL HEALTH, Issue 3 2000
    Dr Vincent Egan
    Introduction The Psychological Inventory of Criminal Thinking Styles (PICTS) is a measure of the criminal cognitions and thinking styles that maintain offending. The scale comprises 8 a priori thinking styles and two validation scales, the validation scales having been found to be unreliable. Owing to the large amount of apparently shared variance in the original validation study, this data matrix needs re-analysis. Results from the PICTS were examined in relation to general measures of individual differences, in order to link the PICTS to the broader literature on the characteristics of offenders. Method The original PICTS data-matrix was re-analysed using a more parsimonious method of analysis. The PICYS was also given to 54 detained, mentally disordered offenders along with the NEO-Five Factor Inventory, the Sensation-Seeking Scale (SSS), the Attention Deficit Scales for Adults (ADSA) and, as a measure of general intelligence, the Standard Progressive Matrices. Results Principal components analysis suggested that the PICTS really comprised two factors: a lack of thoughtfulness (i.e. lack of attention to one's experience), and wilful hostility, with the first factor being most well defined. Intelligence was not associated with any factor of criminal thinking style. High scores on the ADSA and Disinhibition and Boredom Susceptibility subscales of the SSS were associated with much greater endorsement of criminal sentiments; high Neuroticism, low Extroversion, and low Agreeableness were slightly lower correlates. Discussion The issues involved in criminogenic cognitions need clarification and to be linked to the broader literature on cognitive distortions and personality. Interventions targeted at dismantling impulsive destructive behaviour, whether it be thoughtlessness or wilful hostility, may be effected by increasing thinking skills, so breaking down the cognitions that maintain criminal behaviour. Copyright © 2000 Whurr Publishers Ltd. [source]


    Application of Multivariate curve resolution-alternating least square methods on the resolution of overlapping CE peaks from different separation conditions

    ELECTROPHORESIS, Issue 20 2007
    Fang Zhang
    Abstract Discussed in this paper is the development of a new strategy to improve resolution of overlapping CE peaks by using second-order multivariate curve resolution with alternating least square (second-order MCR-ALS) methods. Several kinds of organic reagents are added, respectively, in buffers and sets of overlapping peaks with different separations are obtained. Augmented matrix is formed by the corresponding matrices of the overlapping peaks and is then analyzed by the second-order MCR-ALS method in order to use all data information to improve the precision of the resolution. Similarity between the resolved unit spectrum and the true one is used to assess the quality of the solutions provided by the above method. 3,4-Dihydropyrimidin-2-one derivatives (DHPOs) are used as model components and mixed artificially in order to obtain overlapping peaks. Three different impurity levels, 100, 20, and 10% relative to the main component, are used. With this strategy, the concentration profiles and spectra of impurities, which are no more than 10% of the main component, can be resolved from the overlapping peaks without pure standards participant in the analysis. The effects of the changes in the components spectra in the buffer with different organic reagents on the resolution are also evaluated, which are slight and can thus be ignored in the analysis. Individual data matrices (two-way data) are also analyzed by using MCR-ALS and heuristic evolving latent projections (HELP) methods and their results are compared with those when MCR-ALS is applied to augmented data matrix (three-way data) analysis. [source]


    Occurrence of toxic cyanobacterial blooms in San Roque Reservoir (Córdoba, Argentina): A field and chemometric study

    ENVIRONMENTAL TOXICOLOGY, Issue 3 2003
    María Valeria Amé
    Abstract We evaluated the presence of cyanobacterial blooms in San Roque Reservoir (Córdoba, Argentina). Cyanobacterial blooms and water samples were collected over 4 years (1998,2002). We confirmed the presence of microcystin-LR and microcystin-RR in 97% of these blooms. The total amount of microcystin (MC) ranged between 5.8 and 2400.0 ,g g,1 of freeze-dried bloom material. These values suggest that guidelines for safe water consumption and recreational use should be established for this reservoir. Twenty-eight physical and chemical parameters were measured in water samples and evaluated by discriminant analysis (DA). A first DA was used to evaluate the factors promoting cyanobacteria occurrence, identifying nine parameters following three patterns associated with cyanobacterial growth. Inorganic phosphorous was found to promote the presence of blooms, whereas the highest proliferation of cyanobacteria was observed in the presence of smaller amounts of carbonate, bicarbonate, sulfate, and fecal coliform bacteria. The results observed during our fieldwork, analyzed using DA, agreed with the results of other laboratory studies, thus confirming the usefulness of DA to help with the evaluation of a complicated environmental data matrix. A second DA, using only water samples collected during the presence of cyanobacteria blooms, identified another nine parameters. The analysis of these parameters allowed us to identify certain environmental factors that could lead to the dominance of toxic strains, thus increasing the amount of MC. The results showed that, in our case, an increase in the water temperature was associated with higher amounts of MC per dry weight unit, whereas an increase in the concentrations of ammonia,nitrogen and iron were associated with lower amounts of MC, thus disfavoring the dominance of toxic strains. © 2003 Wiley Periodicals, Inc. Environ Toxicol 18: 192,201, 2003. [source]


    Ground-roll attenuation using a 2D time-derivative filter

    GEOPHYSICAL PROSPECTING, Issue 3 2009
    Paulo E. M. Melo
    ABSTRACT We present a new filtering method for the attenuation of ground-roll. The method is based on the application of a bi-dimensional filter for obtaining the time-derivative of the seismograms. Before convolving the filter with the input data matrix, the normal moveout correction is applied to the seismograms with the purpose of flattening the reflections. The method can locally attenuate the amplitude of data of low frequency (in the ground-roll and stretch normal moveout region) and enhance flat events (reflections). The filtered seismograms can reveal horizontal or sub-horizontal reflections while vertical or sub-vertical events, associated with ground-roll, are attenuated. A regular set of samples around each neighbourhood data sample of the seismogram is used to estimate the time-derivative. A numerical approximation of the derivative is computed by taking the difference between the interpolated values calculated in both the positive and the negative neighbourhood of the desired position. The coefficients of the 2D time-derivative filter are obtained by taking the difference between two filters that interpolate at positive and negative times. Numerical results that use real seismic data show that the proposed method is effective and can reveal reflections masked by the ground-roll. Another benefit of the method is that the stretch mute, normally applied after the normal moveout correction, is unnecessary. The new filtering approach provides results of outstanding quality when compared to results obtained from the conventional FK filtering method. [source]


    Exact multivariate tests for brain imaging data

    HUMAN BRAIN MAPPING, Issue 1 2002
    Rita Almeida
    Abstract In positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) data sets, the number of variables is larger than the number of observations. This fact makes application of multivariate linear model analysis difficult, except if a reduction of the data matrix dimension is performed prior to the analysis. The reduced data set, however, will in general not be normally distributed and therefore, the usual multivariate tests will not be necessarily applicable. This problem has not been adequately discussed in the literature concerning multivariate linear analysis of brain imaging data. No theoretical foundation has been given to support that the null distributions of the tests are as claimed. Our study addresses this issue by introducing a method of constructing test statistics that follow the same distributions as when the data matrix is normally distributed. The method is based on the invariance of certain tests over a large class of distributions of the data matrix. This implies that the method is very general and can be applied for different reductions of the data matrix. As an illustration we apply a test statistic constructed by the method now presented to test a multivariate hypothesis on a PET data set. The test rejects the null hypothesis of no significant differences in measured brain activity between two conditions. The effect responsible for the rejection of the hypothesis is characterized using canonical variate analysis (CVA) and compared with the result obtained by using univariate regression analysis for each voxel and statistical inference based on size of activations. The results obtained from CVA and the univariate method are similar. Hum. Brain Mapping 16:24,35, 2002. © 2002 Wiley-Liss, Inc. [source]


    Historical biogeography of some river basins in central Mexico evidenced by their goodeine freshwater fishes: a preliminary hypothesis using secondary Brooks parsimony analysis

    JOURNAL OF BIOGEOGRAPHY, Issue 8 2006
    Omar Domínguez-Domínguez
    Abstract Aims, Our aim was to uncover and describe patterns of historical biogeography of the main river basins in central Mexico, based on a secondary Brooks parsimony analysis (BPA) of goodeine fishes, and to understand the processes that determine them with respect to the molecular clock of the goodeines and the geological events that have taken place in the region since the Miocene. Location, The region covered in this study includes central Mexico, mostly the so-called Mesa Central of Mexico, an area argued to be a transitional zone comprising several major river drainages from their headwaters at high elevations along the Transmexican Volcanic Belt to the coast of the Gulf of Mexico and the Pacific Ocean. Methods, Based on a previous phylogenetic hypothesis regarding the Goodeidae, we built a data matrix using additive binary coding. First, we conducted a primary BPA to provide general explanations of the historical biogeography of Central Mexico. As ambiguity was found, a secondary BPA was conducted, and some areas were duplicated in order to explain the reticulated history of the area. Area cladograms were obtained by running a parsimony analysis. Instances of vicariance and non-vicariance processes were described with reference to the cladogram obtained from secondary BPA. Results, The study area was divided into 18 discrete regions. Primary BPA produced nine equally parsimonious cladograms with 129 steps, and a consistency index (CI) of 0.574. A strict consensus cladogram shows low resolution among some areas, but other area relationships are consistent. For secondary BPA, five of the 18 regions were duplicated (LEA, COT, AYU, CUT, PAN); one was triplicated (BAL); and one was quadruplicated (AME), suggesting that the pattern of distribution of species in these areas reflects multiple independent events. These areas correspond with the regions exhibiting the highest levels of diversification and the most complex geological history, and those for which river piracy events or basin connections have been proposed. The secondary BPA produced a single most parsimonious cladogram with 118 steps, and a CI of 0.858. This cladogram shows that none of the duplicated areas are nested together, reinforcing the idea of a reticulated history of the areas and not a single vicariant event. Main conclusions, Although our results are preliminary and we cannot establish this as a general pattern, as the BPA is based on a single-taxon cladogram, resolution obtained in the secondary BPA provides some insights regarding the historical biogeography of this group of fishes in river basins of central Mexico. Secondary BPA indicates that the historical biogeography of central Mexico, as shown by their goodeine freshwater fishes, is complex and is a result of a series of vicariant and non-vicariant events such as post-dispersal speciation and post-speciation dispersal. [source]


    A diagonal measure and a local distance matrix to display relations between objects and variables,

    JOURNAL OF CHEMOMETRICS, Issue 1 2010
    Gergely Tóth
    Abstract Proper permutation of data matrix rows and columns may result in plots showing striking information on the objects and variables under investigation. To control the permutation first, a diagonal matrix measureD was defined expressing the size relations of the matrix elements. D is essentially the absolute norm of a matrix where the matrix elements are weighted by their distance to the matrix diagonal. Changing the order of rows and columns increases or decreases D. Monte Carlo technique was used to achieve maximum D in the case of the object distance matrix or even minimal D in the case of the variable correlation matrix to get similar objects or variables close together. Secondly, a local distance matrix was defined, where an element reflects the distances of neighboring objects in a limited subspace of the variables. Due to the maximization of D in the local distance matrix by row and column changes of the original data matrix, the similar objects were arranged close to each other and simultaneously the variables responsible for their similarity were collected close to the diagonal part defined by these objects. This combination of the diagonal measure and the local distance matrix seems to be an efficient tool in the exploration of hidden similarities of a data matrix. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    Determination of rank by median absolute deviation (DRMAD): a simple method for determining the number of principal factors responsible for a data matrix,

    JOURNAL OF CHEMOMETRICS, Issue 1 2009
    Edmund R. Malinowski
    Abstract Median absolute deviation (MAD) is a well-established statistical method for determining outliers. This simple statistic can be used to determine the number of principal factors responsible for a data matrix by direct application to the residual standard deviation (RSD) obtained from principal component analysis (PCA). Unlike many other popular methods the proposed method, called determination of rank by MAD (DRMAD), does not involve the use of pseudo degrees of freedom, pseudo F -tests, extensive calibration tables, time-consuming iterations, nor empirical procedures. The method does not require strict adherence to normal distributions of experimental uncertainties. The computations are direct, simple to use and extremely fast, ideally suitable for online data processing. The results obtained using various sets of chemical data previously reported in the chemical literature agree with the early work. Limitations of the method, determined from model data, are discussed. An algorithm, written in MATLAB format, is presented in the Appendix. Copyright © 2008 John Wiley & Sons, Ltd. [source]


    PHYLOGENETIC SYSTEMATICS OF THE ULVACEAE (ULVALES, ULVOPHYCEAE) USING CHLOROPLAST AND NUCLEAR DNA SEQUENCES,

    JOURNAL OF PHYCOLOGY, Issue 6 2002
    Hillary S. Hayden
    Systematic hypotheses for the Ulvaceae were tested using phylogenetic analysis of sequences for the gene encoding the large subunit of RUBISCO, small subunit rDNA and a combined data matrix. Representatives of eight putative ulvaceous genera and twelve additional taxa from the Ulvophyceae and Trebouxiophyceae were included in analyses using maximum parsimony and maximum likelihood criteria. Molecular data supported hypotheses for the Ulvaceae that are based on the early development of vegetative thalli and motile cell ultrastructure. Ulvaceae sensu Floyd and O'Kelly, including Percursaria Bory de Saint-Vincent, Ulvaria Ruprecht and a complex of closely related species of Chloropelta Tanner, Enteromorpha Link and Ulva L. was supported; however, monophyly of Enteromorpha and Ulva was not supported. The Ulvales and Ulotrichales sensu Floyd and O'Kelly were monophyletic. Blidingia Kylin and Kornmannia Bliding were allied with the former and Capsosiphon Gobi with the latter, although relationships among these and other taxa in these orders remain uncertain. The Ulvales are characterized by an isomorphic life history pattern, gametangia and sporangia that are identical in structure and development, motile cells with bilobed terminal caps and proximal sheaths consisting of two equal subunits. Method of motile cell release and the gross morphology of vegetative thalli are not systematically reliable characters. [source]


    Diagnostics for multivariate imputations

    JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 3 2008
    Kobi Abayomi
    Summary., We consider three sorts of diagnostics for random imputations: displays of the completed data, which are intended to reveal unusual patterns that might suggest problems with the imputations, comparisons of the distributions of observed and imputed data values and checks of the fit of observed data to the model that is used to create the imputations. We formulate these methods in terms of sequential regression multivariate imputation, which is an iterative procedure in which the missing values of each variable are randomly imputed conditionally on all the other variables in the completed data matrix. We also consider a recalibration procedure for sequential regression imputations. We apply these methods to the 2002 environmental sustainability index, which is a linear aggregation of 64 environmental variables on 142 countries. [source]


    High-resolution NMR correlation experiments in a single measurement (HR-PANACEA)

    MAGNETIC RESONANCE IN CHEMISTRY, Issue 5 2010
    riks Kup
    Abstract Three important NMR pulse sequences, INADEQUATE, HSQC and three-dimensional HMBC have been combined into a single entity called high-resolution Parallel Acquisition NMR: an All-in-one Combination of Experimental Applications (HR-PANACEA) to provide reliable structural information about a small molecule in a single measurement. This exploits a recent instrumental development that permits simultaneous acquisition of signals from several nuclear species, using multiple receivers. Where high-precision values of the long-range heteronuclear splittings are important, selected regions of a large experimental data matrix are extracted and examined with the highest possible resolution. The J -doubling technique is then applied to derive precise values for these couplings. As proof of principle, the method is applied to the molecule of methyl salicylate, confirming the expected conformation of the COH moiety. Copyright © 2010 John Wiley & Sons, Ltd. [source]


    Lanczos, Householder transformations, and implicit deflation for fast and reliable dominant singular subspace computation

    NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, Issue 4 2001
    Ricardo D. Fierro
    Abstract Many applications, such as subspace-based models in information retrieval and signal processing, require the computation of singular subspaces associated with the k dominant, or largest, singular values of an m×n data matrix A, where k,min(m,n). Frequently, A is sparse or structured, which usually means matrix,vector multiplications involving A and its transpose can be done with much less than ,,(mn) flops, and A and its transpose can be stored with much less than ,,(mn) storage locations. Many Lanczos-based algorithms have been proposed through the years because the underlying Lanczos method only accesses A and its transpose through matrix,vector multiplications. We implement a new algorithm, called KSVD, in the Matlab environment for computing approximations to the singular subspaces associated with the k dominant singular values of a real or complex matrix A. KSVD is based upon the Lanczos tridiagonalization method, the WY representation for storing products of Householder transformations, implicit deflation, and the QR factorization. Our Matlab simulations suggest it is a fast and reliable strategy for handling troublesome singular-value spectra. Copyright © 2001 John Wiley & Sons, Ltd. [source]


    Systematics of Chaetocerotaceae (Bacillariophyceae).

    PHYCOLOGICAL RESEARCH, Issue 2 2003

    SUMMARY In order to construct a model of evolutionary relationships within the diatom family Chaetocerotaceae, 37 species of Chaetoceros Ehrenberg, representing all subgenera and 21 of 22 subgeneric sections of the genus, plus three Bacteriastrum Shadbolt species, representing both of its subgeneric sections, were subjected to cladistic analysis. One species each of Eucampia Ehrenberg, Cerataulina Peragallo, Hemiau-lus Ehrenberg, Attheya West and Gonioceros H. & M. Peragallo were used as outgroups. A matrix of 65 binary and multistate morphological characters was constructed, with data being gathered from original observation of material in the light and electron microscopes, and from the published literature. The analysis yielded 36 most-parsimonious cladograms of 316 steps; incongruence between trees is largely restricted to some taxa representing undersampled sections of Chaetoceros subg. Hyalochaete. The robustness of this hypothesis was examined in several ways. To assess the effect of character weighting, the bootstrap was used to randomly weight characters. The parsimony criterion was relaxed via a decay index, and finally, the tree length was compared to that of trees randomly generated from the data matrix. The majority of investigated species of Chaetoceros subg. Phaeoceros, Chaetoceros subg. Hyalochaete and Bacteriastrum appear to belong to a continuous grade, rather than comprising individual clades. Chaetoceros is paraphyletic. Thus, the traditional classification does not accurately reflect the hypothesized phylogenetic relationships of this family. [source]


    Origin of Fueguian-Patagonians: An approach to population history and structure using R matrix and matrix permutation methods

    AMERICAN JOURNAL OF HUMAN BIOLOGY, Issue 3 2002
    Rolando González José
    A complicated history of isolation between Fueguian and Patagonian groups (originated by the appearance of the Straits of Magellan) as much as differences in population structure and life strategies constitute important factors in the clustering pattern of those groups. The aim of this work was to test several hypotheses about population structure and history of Fueguian-Patagonians to propose a model that incorporates predictions for future studies. R matrix methods and matrix permutation analyses were performed upon a data matrix of craniofacial measurements of 441 skulls divided into nine samples pertaining to six Patagonian and three Fueguian populations. Association of biological distances with three matrices representing several settlement patterns was tested using matrix permutation tests. Results of R matrix study show that the minimum genetic distance obtained confirms separation between Fueguians and Patagonians. Moreover, an analysis of residual variances from the expected regression line confirms admixture between Andean and Pampean populations and Araucanian groups, consistent with ethnohistorical observations. A model representing a long history of isolation between Fueguian and Patagonians, rather than a model emphasizing differences in life-strategies, presented the best correlation with the biological distance matrix. Because similar results were already obtained in archaeological, molecular, and morphological studies, a model for the settlement of Tierra del Fuego is proposed. It is summarized by four main hypotheses that can be tested independently by different disciplines in the future. Am. J. Hum. Biol. 14:308,320, 2002. © 2002 Wiley-Liss, Inc. [source]


    Identification of discriminant factors after treatment of resistant and susceptible banana leaves with Fusarium oxysporum f. sp. cubense culture filtrates

    PLANT BREEDING, Issue 1 2005
    B. Companioni
    Abstract Among the most important crops in developing countries are banana and plantain. However, the production is threatened by increasingly virulent forms of Fusarium wilt, and therefore, intensive breeding programmes are being carried out worldwide. As conventional field studies of banana resistance to this disease are time-consuming and destructive, an easy-to-do procedure was previously developed to differentiate field-grown resistant and susceptible banana cultivars at leaf level. Such a procedure involved the in vitro treatment of fungal culture filtrates on to field-grown adult leaves and the measurement of lesion areas 48 h later. The present report includes measurements of other indicators such as biochemical compounds. The cultivar ,Gross Michel' (susceptible) and cv. ,FHIA-01' (resistant) leaves were treated with Fusarium oxysporum f. sp. cubense race 1 culture filtrates. Evaluations were performed 48 h after leaf treatment. Compared with culture medium-treated leaves (control treatment), fungal metabolites produced leaf lesions, decreased freephenolic contents and increased protein levels in both cultivars. In ,FHIA-01', the culture filtrate increased contents of cell wall-linked phenolics and the pool of aldehydes (except malondialdehyde). Fungal metabolites did not cause variations in peroxidase activity, chlorophyll pigment contents or malondialdehyde level in any cultivar. The use of Fisher's linear discriminant analysis to differentiate resistant and susceptible banana cultivars in breeding programmes is also a novel aspect of this report. Such an estimation was performed from a data matrix that included the effects of the fungal metabolites (leaf lesion area and levels of free and cell wall-linked phenolics, aldehydes, except malondialdehyde, and proteins) on banana leaves of seven cultivars (four susceptible and three resistant). [source]


    Comparative analysis of gene expression on mRNA and protein level during development of Streptomyces cultures by using singular value decomposition

    PROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 21 2007
    Jiri Vohradsky Dr.
    Abstract This paper describes a comparative systems level analysis of the developmental proteome and transcriptome in the model antibiotic-producing eubacterium Streptomyces coelicolor, cultured on different media. The analysis formulates expression as the superposition of effects of regulatory networks and biological processes which can be identified using singular value decomposition (SVD) of a data matrix formed by time series measurements of expression of individual genes throughout the cell cycle of the bacterium. SVD produces linearly orthogonal factors, each of which can represent an independent system behavior defined by a linear combination of the genes/proteins highly correlated with the corresponding factor. By using SVD of the developmental time series of gene expression, as measured by both protein and RNA levels, we show that on the highest level of control (representing the basic kinetic behavior of the population), the results are identical, regardless of the type of experiment or cultivation method. The results show that this approach is capable of identifying basic regulatory processes independent of the environment in which the organism lives. It also shows that these processes are manifested equally on protein and RNA levels. Biological interpretation of the correlation of the genes and proteins with significant eigenprofiles (representing the highest level kinetic behavior of protein and/or RNA synthesis) revealed their association with metabolic processes, stress responses, starvation, and secondary metabolite production. [source]


    A Method for Evaluating Outcomes of Restoration When No Reference Sites Exist

    RESTORATION ECOLOGY, Issue 1 2009
    J. Stephen Brewer
    Abstract Ecological restoration typically seeks to shift species composition toward that of existing reference sites. Yet, comparing the assemblages in restored and reference habitats assumes that similarity to the reference habitat is the optimal outcome of restoration and does not provide a perspective on regionally rare off-site species. When no such reference assemblages of species exist, an accurate assessment of the habitat affinities of species is crucial. We present a method for using a species by habitat data matrix generated by biodiversity surveys to evaluate community responses to habitat restoration treatments. Habitats within the region are rated on their community similarity to a hypothetical restored habitat, other habitats of conservation concern, and disturbed habitats. Similarity scores are reinserted into the species by habitat matrix to produce indicator (I) scores for each species in relation to these habitats. We apply this procedure to an open woodland restoration project in north Mississippi (U.S.A.) by evaluating initial plant community responses to restoration. Results showed a substantial increase in open woodland indicators, a modest decrease in generalists historically restricted to floodplain forests, and no significant change in disturbance indicators as a group. These responses can be interpreted as a desirable outcome, regardless of whether species composition approaches that of reference sites. The broader value of this approach is that it provides a flexible and objective means of predicting and evaluating the outcome of restoration projects involving any group of species in any region, provided there is a biodiversity database that includes habitat and location information. [source]


    Fault detection and isolation for dynamic processes using recursive principal component analysis (PCA) based on filtering of signals

    ASIA-PACIFIC JOURNAL OF CHEMICAL ENGINEERING, Issue 6 2007
    Jyh-Cheng Jeng
    Abstract A systematic procedure for the fault detection and isolation of dynamic systems is presented. The inputs of the process first pass through the dynamic filters which represent the process dynamics. Then, principal component analysis (PCA) is applied to the data matrix consisting of these filtered signals and the process outputs for fault detection. In case of a fault being detected, owing to an artificial linear relationship existing in the data matrix, the last principal component (LPC) is adopted for fault isolation. A recursive algorithm for PCA based on rank-one matrix update of the covariance is derived to compute the LPC on line. Patterns of the LPC are devised to isolate these faults, which include constant-bias and high-frequency noises originating from sensor measurement, errors resulting from input disturbance and change in the process gain. Furthermore, the magnitude of the fault can also be identified from the computed LPC. An illustrative example is used to verify the effectiveness of the proposed method. Copyright © 2007 Curtin University of Technology and John Wiley & Sons, Ltd. [source]


    Determining best complete subsets of specimens and characters for multivariate morphometric studies in the presence of large amounts of missing data

    BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY, Issue 2 2006
    RICHARD E. STRAUSS
    Missing data are frequent in morphometric studies of both fossil and recent material. A common method of addressing the problem of missing data is to omit combinations of characters and specimens from subsequent analyses; however, omitting different subsets of characters and specimens can affect both the statistical robustness of the analyses and the resulting biological interpretations. We describe a method of examining all possible subsets of complete data and of scoring each subset by the ,condition' (ratio of first eigenvalue to second, or of second to first, depending on context) of the corresponding covariance or correlation matrix, and subsequently choosing the submatrix that either optimizes one of these criteria or matches the estimated condition of the original data matrix. We then describe an extension of this method that can be used to choose the ,best' characters and specimens for which some specified proportion of missing data can be estimated using standard imputation techniques such as the expectation-maximization algorithm or multiple imputation. The methods are illustrated with published and unpublished data sets on fossil and extant vertebrates. Although these problems and methods are discussed in the context of conventional morphometric data, they are applicable to many other kinds of data matrices. © 2006 The Linnean Society of London, Biological Journal of the Linnean Society, 2006, 88, 309,328. [source]


    Calibration of Multivariate Scatter plots for Exploratory Analysis of Relations Within and Between Sets of Variables in Genomic Research

    BIOMETRICAL JOURNAL, Issue 6 2005
    Jan Graffelman
    Abstract The scatter plot is a well known and easily applicable graphical tool to explore relationships between two quantitative variables. For the exploration of relations between multiple variables, generalisations of the scatter plot are useful. We present an overview of multivariate scatter plots focussing on the following situations. Firstly, we look at a scatter plot for portraying relations between quantitative variables within one data matrix. Secondly, we discuss a similar plot for the case of qualitative variables. Thirdly, we describe scatter plots for the relationships between two sets of variables where we focus on correlations. Finally, we treat plots of the relationships between multiple response and predictor variables, focussing on the matrix of regression coefficients. We will present both known and new results, where an important original contribution concerns a procedure for the inclusion of scales for the variables in multivariate scatter plots. We provide software for drawing such scales. We illustrate the construction and interpretation of the plots by means of examples on data collected in a genomic research program on taste in tomato. (© 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


    Application of PC-ANN to Acidity Constant Prediction of Various Phenols and Benzoic Acids in Water

    CHINESE JOURNAL OF CHEMISTRY, Issue 5 2008
    Aziz HABIBI-YANGJEH
    Abstract Principal component regression (PCR) and principal component-artificial neural network (PC-ANN) models were applied to prediction of the acidity constant for various benzoic acids and phenols (242 compounds) in water at 25 °C. A large number of theoretical descriptors were calculated for each molecule. The first fifty principal components (PC) were found to explain more than 95% of variances in the original data matrix. From the pool of these PC's, the eigenvalue ranking method was employed to select the best set of PC for PCR and PC-ANN models. The PC-ANN model with architecture 47-20-1 was generated using 47 principal components as inputs and its output is pKa. For evaluation of the predictive power of the PCR and PC-ANN models, pKa values of 37 compounds in the prediction set were calculated. Mean percentage deviation (MPD) for PCR and PC-ANN models are 18.45 and 0.6448, respectively. These improvements are due to the fact that the pKa of the compounds demonstrate non-linear correlations with the principal components. Comparison of the results obtained by the models reveals superiority of the PC-ANN model relative to the PCR model. [source]