Distance Metrics (distance + metric)

Distribution by Scientific Domains


Selected Abstracts


Application of the Levenshtein Distance Metric for the Construction of Longitudinal Data Files

EDUCATIONAL MEASUREMENT: ISSUES AND PRACTICE, Issue 2 2010
Harold C. Doran
The analysis of longitudinal data in education is becoming more prevalent given the nature of testing systems constructed for No Child Left Behind Act (NCLB). However, constructing the longitudinal data files remains a significant challenge. Students move into new schools, but in many cases the unique identifiers (ID) that should remain constant for each student change. As a result, different students frequently share the same ID, and merging records for an ID that is erroneously assigned to different students clearly becomes problematic. In small data sets, quality assurance of the merge can proceed through human reviews of the data to ensure all merged records are properly joined. However, in data sets with hundreds of thousands of cases, quality assurance via human review is impossible. While the record linkage literature has many applications in other disciplines, the educational measurement literature lacks details of formal protocols that can be used for quality assurance procedures for longitudinal data files. This article presents an empirical quality assurance procedure that may be used to verify the integrity of the merges performed for longitudinal analysis. We also discuss possible extensions that would permit merges to occur even when unique identifiers are not available. [source]


DiFi: Fast 3D Distance Field Computation Using Graphics Hardware

COMPUTER GRAPHICS FORUM, Issue 3 2004
Avneesh Sud
We present an algorithm for fast computation of discretized 3D distance fields using graphics hardware. Given a set of primitives and a distance metric, our algorithm computes the distance field for each slice of a uniform spatial grid baly rasterizing the distance functions of the primitives. We compute bounds on the spatial extent of the Voronoi region of each primitive. These bounds are used to cull and clamp the distance functions rendered for each slice. Our algorithm is applicable to all geometric models and does not make any assumptions about connectivity or a manifold representation. We have used our algorithm to compute distance fields of large models composed of tens of thousands of primitives on high resolution grids. Moreover, we demonstrate its application to medial axis evaluation and proximity computations. As compared to earlier approaches, we are able to achieve an order of magnitude improvement in the running time. Categories and Subject Descriptors (according to ACM CCS): I.3.3 [Computer Graphics]: Distance fields, Voronoi regions, graphics hardware, proximity computations [source]


Socio-economic distance and spatial patterns in unemployment

JOURNAL OF APPLIED ECONOMETRICS, Issue 4 2002
Timothy G. Conley
This paper examines the spatial patterns of unemployment in Chicago between 1980 and 1990. We study unemployment clustering with respect to different social and economic distance metrics that reflect the structure of agents' social networks. Specifically, we use physical distance, travel time, and differences in ethnic and occupational distribution between locations. Our goal is to determine whether our estimates of spatial dependence are consistent with models in which agents' employment status is affected by information exchanged locally within their social networks. We present non-parametric estimates of correlation across Census tracts as a function of each distance metric as well as pairs of metrics, both for unemployment rate itself and after conditioning on a set of tract characteristics. Our results indicate that there is a strong positive and statistically significant degree of spatial dependence in the distribution of raw unemployment rates, for all our metrics. However, once we condition on a set of covariates, most of the spatial autocorrelation is eliminated, with the exception of physical and occupational distance. Racial and ethnic composition variables are the single most important factor in explaining the observed correlation patterns. Copyright 2002 John Wiley & Sons, Ltd. [source]


A new rank correlation coefficient with application to the consensus ranking problem

JOURNAL OF MULTI CRITERIA DECISION ANALYSIS, Issue 1 2002
Edward J. Emond
Abstract The consensus ranking problem has received much attention in the statistical literature. Given m rankings of n objects the objective is to determine a consensus ranking. The input rankings may contain ties, be incomplete, and may be weighted. Two solution concepts are discussed, the first maximizing the average weighted rank correlation of the solution ranking with the input rankings and the second minimizing the average weighted Kemeny,Snell distance. A new rank correlation coefficient called ,x is presented which is shown to be the unique rank correlation coefficient which is equivalent to the Kemeny-Snell distance metric. The new rank correlation coefficient is closely related to Kendall's tau but differs from it in the way ties are handled. It will be demonstrated that Kendall's ,b is flawed as a measure of agreement between weak orderings and should no longer be used as a rank correlation coefficient. The use of ,x in the consensus ranking problem provides a more mathematically tractable solution than the Kemeny,Snell distance metric because all the ranking information can be summarized in a single matrix. The methods described in this paper allow analysts to accommodate the fully general consensus ranking problem with weights, ties, and partial inputs. Copyright 2002 John Wiley & Sons, Ltd. [source]


Robustness of Chi-square and Canberra distance metrics for computer intrusion detection

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, Issue 1 2002
Syed Masum Emran
Abstract Intrusion detection complements intrusion prevention mechanisms, such as firewalls, cryptography, and authentication, to capture intrusions into an information system while they are acting on the information system. We develop two multivariate quality control techniques based on chi-square and Canberra distance metrics, respectively, to detect intrusions by building a long-term profile of normal activities in the information system (norm profile) and using the norm profile to detect anomalies. We investigate the robustness of these two distance metrics by comparing their performance on a number of data sets involving different noise levels in data. The performance results indicate that the Chi-square distance metric is much more robust to noises than the Canberra distance metric. Copyright 2002 John Wiley & Sons, Ltd. [source]


Adaptive thinning of atmospheric observations in data assimilation with vector quantization and filtering methods

THE QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, Issue 613 2005
T. Ochotta
Abstract In data assimilation for numerical weather prediction, measurements of various observation systems are combined with background data to define initial states for the forecasts. Current and future observation systems, in particular satellite instruments, produce large numbers of measurements with high spatial and temporal density. Such datasets significantly increase the computational costs of the assimilation and, moreover, can violate the assumption of spatially independent observation errors. To ameliorate these problems, we propose two greedy thinning algorithms, which reduce the number of assimilated observations while retaining the essential information content of the data. In the first method, the number of points in the output set is increased iteratively. We use a clustering method with a distance metric that combines spatial distance with difference in observation values. In a second scheme, we iteratively estimate the redundancy of the current observation set and remove the most redundant data points. We evaluate the proposed methods with respect to a geometric error measure and compare them with a uniform sampling scheme. We obtain good representations of the original data with thinnings retaining only a small portion of observations. We also evaluate our thinnings of ATOVS satellite data using the assimilation system of the Deutscher Wetterdienst. Impact of the thinning on the analysed fields and on the subsequent forecasts is discussed. Copyright 2005 Royal Meteorological Society [source]


A Goodness-of-Fit Test for Multinomial Logistic Regression

BIOMETRICS, Issue 4 2006
Jelle J. Goeman
Summary This article presents a score test to check the fit of a logistic regression model with two or more outcome categories. The null hypothesis that the model fits well is tested against the alternative that residuals of samples close to each other in covariate space tend to deviate from the model in the same direction. We propose a test statistic that is a sum of squared smoothed residuals, and show that it can be interpreted as a score test in a random effects model. By specifying the distance metric in covariate space, users can choose the alternative against which the test is directed, making it either an omnibus goodness-of-fit test or a test for lack of fit of specific model variables or outcome categories. [source]


Proxy caching algorithms and implementation for time-shifted TV services

EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, Issue 2 2008
Tim Wauters
The increasing popularity of multimedia streaming applications introduces new challenges in content distribution networks (CDNs). Streaming services such as Video on Demand (VoD) or digital television over the Internet (IPTV) are very bandwidth-intensive and cannot tolerate the high start-up delays and poor loss properties of today's Internet. To solve these problems, caching (the initial segment of) popular streams at proxies could be envisaged. This paper presents a novel caching algorithm and architecture for time-shifted television (tsTV) and its implementation, using the IETF's Real-Time Streaming Protocol (RTSP). The algorithm uses sliding caching windows with sizes depending on content popularity and/or distance metrics. The caches can work in stand-alone mode as well as in co-operative mode. This paper shows that the network load can already be reduced considerably using small diskless caches, especially when using co-operative caching. A prototype implementation is detailed and evaluated through performance measurements. Copyright 2007 John Wiley & Sons, Ltd. [source]


Not seeing the ocean for the islands: the mediating influence of matrix-based processes on forest fragmentation effects

GLOBAL ECOLOGY, Issue 1 2006
John A. Kupfer
ABSTRACT The pervasive influence of island biogeography theory on forest fragmentation research has often led to a misleading conceptualization of landscapes as areas of forest/habitat and ,non-forest/non-habitat' and an overriding focus on processes within forest remnants at the expense of research in the human-modified matrix. The matrix, however, may be neither uniformly unsuitable as habitat nor serve as a fully,absorbing barrier to the dispersal of forest taxa. In this paper, we present a conceptual model that addresses how forest habitat loss and fragmentation affect biodiversity through reduction of the resource base, subdivision of populations, alterations of species interactions and disturbance regimes, modifications of microclimate and increases in the presence of invasive species and human pressures on remnants. While we acknowledge the importance of changes associated with the forest remnants themselves (e.g. decreased forest area and increased isolation of forest patches), we stress that the extent, intensity and permanence of alterations to the matrix will have an overriding influence on area and isolation effects and emphasize the potential roles of the matrix as not only a barrier but also as habitat, source and conduit. Our intention is to argue for shifting the examination of forest fragmentation effects away from a patch-based perspective focused on factors such as patch area and distance metrics to a landscape mosaic perspective that recognizes the importance of gradients in habitat conditions. [source]


Socio-economic distance and spatial patterns in unemployment

JOURNAL OF APPLIED ECONOMETRICS, Issue 4 2002
Timothy G. Conley
This paper examines the spatial patterns of unemployment in Chicago between 1980 and 1990. We study unemployment clustering with respect to different social and economic distance metrics that reflect the structure of agents' social networks. Specifically, we use physical distance, travel time, and differences in ethnic and occupational distribution between locations. Our goal is to determine whether our estimates of spatial dependence are consistent with models in which agents' employment status is affected by information exchanged locally within their social networks. We present non-parametric estimates of correlation across Census tracts as a function of each distance metric as well as pairs of metrics, both for unemployment rate itself and after conditioning on a set of tract characteristics. Our results indicate that there is a strong positive and statistically significant degree of spatial dependence in the distribution of raw unemployment rates, for all our metrics. However, once we condition on a set of covariates, most of the spatial autocorrelation is eliminated, with the exception of physical and occupational distance. Racial and ethnic composition variables are the single most important factor in explaining the observed correlation patterns. Copyright 2002 John Wiley & Sons, Ltd. [source]


The Revenge of Distance: Vulnerability Analysis of Critical Information Infrastructure

JOURNAL OF CONTINGENCIES AND CRISIS MANAGEMENT, Issue 2 2004
Sean P. Gorman
The events of 11 September 2001 brought an increased focus on security in the United States and specifically the protection of critical infrastructure. Critical infrastructure encompasses a wide array of physical assets such as the electric power grid, telecommunications, oil and gas pipelines, transportation networks and computer data networks. This paper will focus on computer data networks and the spatial implications of their susceptibility to targeted attacks. Utilising a database of national data carriers, simulations will be run to determine the repercussions of targeted attacks and what the relative merits of different methods of identifying critical nodes are. This analysis will include comparison of current methods employed in vulnerability analysis with spatially constructed methods incorporating regional and distance variables. In addition to vulnerability analysis a method will be proposed to analyse the fusion of physical and logical networks, and will discuss what new avenues this approach reveals. The analysis concludes that spatial information networks are vulnerable to targeted attacks and algorithms based on distance metrics do a better job of identifying critical nodes than classic accessibility indexes. The results of the analysis are placed in the context of public policy posing the question do private infrastructure owners have sufficient incentives to remedy vulnerabilities in critical networks. [source]


Robustness of Chi-square and Canberra distance metrics for computer intrusion detection

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, Issue 1 2002
Syed Masum Emran
Abstract Intrusion detection complements intrusion prevention mechanisms, such as firewalls, cryptography, and authentication, to capture intrusions into an information system while they are acting on the information system. We develop two multivariate quality control techniques based on chi-square and Canberra distance metrics, respectively, to detect intrusions by building a long-term profile of normal activities in the information system (norm profile) and using the norm profile to detect anomalies. We investigate the robustness of these two distance metrics by comparing their performance on a number of data sets involving different noise levels in data. The performance results indicate that the Chi-square distance metric is much more robust to noises than the Canberra distance metric. Copyright 2002 John Wiley & Sons, Ltd. [source]