Home About us Contact | |||
Reference Dataset (reference + dataset)
Selected AbstractsComparison of missing value imputation methods for crop yield dataENVIRONMETRICS, Issue 4 2006Ravindra S. Lokupitiya Abstract Most ecological data sets contain missing values, a fact which can cause problems in the analysis and limit the utility of resulting inference. However, ecological data also tend to be spatially correlated, which can aid in estimating and imputing missing values. We compared four existing methods of estimating missing values: regression, kernel smoothing, universal kriging, and multiple imputation. Data on crop yields from the National Agricultural Statistical Survey (NASS) and the Census of Agriculture (Ag Census) were the basis for our analysis. Our goal was to find the best method to impute missing values in the NASS datasets. For this comparison, we selected the NASS data for barley crop yield in 1997 as our reference dataset. We found in this case that multiple imputation and regression were superior to methods based on spatial correlation. Universal kriging was found to be the third best method. Kernel smoothing seemed to perform very poorly. Copyright © 2005 John Wiley & Sons, Ltd. [source] A new strategy to filter out false positive identifications of peptides in SEQUEST database search resultsPROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 22 2007Jiyang Zhang Abstract Based on the randomized database method and a linear discriminant function (LDF) model, a new strategy to filter out false positive matches in SEQUEST database search results is proposed. Given an experiment MS/MS dataset and a protein sequence database, a randomized database is constructed and merged with the original database. Then, all MS/MS spectra are searched against the combined database. For each expected false positive rate (FPR), LDFs are constructed for different charge states and used to filter out the false positive matches from the normal database. In order to investigate the error of FPR estimation, the new strategy was applied to a reference dataset. As a result, the estimated FPR was very close to the actual FPR. While applied to a human K562 cell line dataset, which is a complicated dataset from real sample, more matches could be confirmed than the traditional cutoff-based methods at the same estimated FPR. Also, though most of the results confirmed by the LDF model were consistent with those of PeptideProphet, the LDF model could still provide complementary information. These results indicate that the new method can reliably control the FPR of peptide identifications and is more sensitive than traditional cutoff-based methods. [source] Assessing SNP markers for assigning individuals to cattle populationsANIMAL GENETICS, Issue 1 2009R. Negrini Summary The effectiveness of single nucleotide polymorphisms (SNPs) for the assignment of cattle to their source breeds was investigated by analysing a panel of 90 SNPs assayed on 24 European breeds. Breed assignment was performed by comparing the Bayesian and frequentist methods implemented in the structure 2.2 and geneclass 2 software programs. The use of SNPs for the reallocation of known individuals to their breeds of origin and the assignment of unknown individuals was tested. In the reallocation tests, the methods implemented in structure 2.2 performed better than those in geneclass 2, with 96% vs. 85% correct assignments respectively. In contrast, the methods implemented in geneclass 2 showed a greater correct assignment rate in allocating animals treated as unknowns to a reference dataset (62% vs. 51% and 80% vs. 65% in field tests 1 and 2 respectively). These results demonstrate that SNPs are suitable for the assignment of individuals to reference breeds. The results also indicate that structure 2.2 and geneclass 2 can be complementary tools to assess breed integrity and assignment. Our findings also stress the importance of a high-quality reference dataset in allocation studies. [source] Determination of zinc incorporation in the Zn-substituted gallophosphate ZnULM-5 by multiple wavelength anomalous dispersion techniquesACTA CRYSTALLOGRAPHICA SECTION B, Issue 3 2010M. Helliwell The location of isomorphously substituted zinc over eight crystallographically different gallium sites has been determined in a single-crystal study of the gallophosphate ZnULM-5, GaZn(PO4)14(HPO4)2(OH)2F7, [H3N{CH2}6NH3]4, 6H2O, in an 11 wavelength experiment, using data from Station 9.8, SRS Daresbury. The measurement of datasets around the K edges of both Ga and Zn, as well as two reference datasets away from each absorption edge, was utilized to selectively exploit dispersive differences of each metal atom type in turn, which allowed the major sites of Zn incorporation to be identified as the metal 1 and 3 sites, M1 and M3. The preferential substitution of Zn at these sites probably arises because they are located in double four-ring (D4R) building units which can relax to accommodate the incorporation of hetero atoms. As the crystal is non-centrosymmetric, with space group P21212, it was also possible to use anomalous differences to corroborate the results obtained from the dispersive differences. These results were obtained firstly from difference Fourier maps, calculated using a phase set from the refined structure from data measured at the Zr K edge. Also, refined dispersive and anomalous occupancies, on an absolute scale, could be obtained using the program MLPHARE, allowing estimates for the Zn incorporation of approximately 22 and 18 at. % at the M1 and M3 sites to be obtained. In addition, f, and f,, values for Ga and Zn at each wavelength could be estimated both from MLPHARE results, and by refinement in JANA2006. The fully quantitative determinations of the dispersive and anomalous coefficients for Ga and Zn at each wavelength, as well as metal atom occupancies over the eight metal atom sites made use of the CCP4's MLPHARE program as well as SHELXL and JANA2006. The results by these methods agree closely, and JANA2006 allowed the ready determination of standard uncertainties on the occupancy parameters, which were for M1 and M3, 20.6,(3) and 17.2,(3),at %, respectively. [source] |