Disclosure Risk (disclosure + risk)

Distribution by Scientific Domains


Selected Abstracts


Median-based aggregation operators for prototype construction in ordinal scales

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 6 2003
Josep Domingo-Ferrer
This article studies aggregation operators in ordinal scales for their application to clustering (more specifically, to microaggregation for statistical disclosure risk). In particular, we consider these operators in the process of prototype construction. This study analyzes main aggregation operators for ordinal scales [plurality rule, medians, Sugeno integrals (SI), and ordinal weighted means (OWM), among others] and shows the difficulties for their application in this particular setting. Then, we propose two approaches to solve the drawbacks and we study their properties. Special emphasis is given to the study of monotonicity because the operator is proven nonsatisfactory for this property. Exhaustive empirical work shows that in most practical situations, this cannot be considered a problem. © 2003 Wiley Periodicals, Inc. [source]


The probability of identification: applying ideas from forensic statistics to disclosure risk assessment

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 1 2007
C. J. Skinner
Summary., The paper establishes a correspondence between statistical disclosure control and forensic statistics regarding their common use of the concept of ,probability of identification'. The paper then seeks to investigate what lessons for disclosure control can be learnt from the forensic identification literature. The main lesson that is considered is that disclosure risk assessment cannot, in general, ignore the search method that is employed by an intruder seeking to achieve disclosure. The effects of using several search methods are considered. Through consideration of the plausibility of assumptions and ,worst case' approaches, the paper suggests how the impact of search method can be handled. The paper focuses on foundations of disclosure risk assessment, providing some justification for some modelling assumptions underlying some existing record level measures of disclosure risk. The paper illustrates the effects of using various search methods in a numerical example based on microdata from a sample from the 2001 UK census. [source]


Proposals for 2001 samples of anonymized records: An assessment of disclosure risk

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 3 2001
Angela Dale
In 1991 Marsh and co-workers made the case for a sample of anonymized records (SAR) from the 1991 census of population. The case was accepted by the Office for National Statistics (then the Office of Population Censuses and Surveys) and a request was made by the Economic and Social Research Council to purchase the SARs. Two files were released for Great Britain,a 2% sample of individuals and a 1% sample of households. Subsequently similar samples were released for Northern Ireland. Since their release, the files have been heavily used for research and there has been no known breach of confidentiality. There is a considerable demand for similar files from the 2001 census, with specific requests for a larger sample size and lower population threshold for the individual SAR. This paper reassesses the analysis of Marsh and co-workers of the risk of identification of an individual or household in a sample of microdata from the 1991 census and also uses alternative ways of assessing risks with the 1991 SARs. The results of both the reassessment and the new analyses are reassuring and allow us to take the 1991 SARs as a base-line against which to assess proposals for changes to the size and structure of samples from the 2001 census. [source]


A measure of disclosure risk for microdata

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 4 2002
C. J. Skinner
Summary. Protection against disclosure is important for statistical agencies releasing microdata files from sample surveys. Simple measures of disclosure risk can provide useful evidence to support decisions about release. We propose a new measure of disclosure risk: the probability that a unique match between a microdata record and a population unit is correct. We argue that this measure has at least two advantages. First, we suggest that it may be a more realistic measure of risk than two measures that are currently used with census data. Second, we show that consistent inference (in a specified sense) may be made about this measure from sample data without strong modelling assumptions. This is a surprising finding, in its contrast with the properties of the two ,similar' established measures. As a result, this measure has potentially useful applications to sample surveys. In addition to obtaining a simple consistent predictor of the measure, we propose a simple variance estimator and show that it is consistent. We also consider the extension of inference to allow for certain complex sampling schemes. We present a numerical study based on 1991 census data for about 450 000 enumerated individuals in one area of Great Britain. We show that the theoretical results on the properties of the point predictor of the measure of risk and its variance estimator hold to a good approximation for these data. [source]


Bayesian disclosure risk assessment: predicting small frequencies in contingency tables

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 5 2007
Jonathan J. Forster
Summary., We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focused on regions of high probability. Our approach is Bayesian and provides posterior predictive probabilities of identification risk. By incorporating model uncertainty in our analysis, we can provide more realistic estimates of disclosure risk for individual cell counts than are provided by methods which ignore the multivariate structure of the data set. [source]