Appendix 7

De-identification techniques and privacy evaluation in synthetic data

Updated guidance for reducing linkability, understanding disclosure risks, and evaluating synthetic data privacy using a portfolio of evidence rather than a single score.

Use in Step 4 Evaluation metrics Auditing and DP Open source PDF Download PDF

De-identification is not binary

The appendix treats de-identification as a risk-management exercise. Stronger privacy controls generally reduce utility, so the trade-off must be documented and justified.

Privacy risk is multi-dimensional

Identity, membership, and attribute disclosure can each arise in different ways. Evaluating only one of them is not enough.

Built for governance teams

Use this appendix when custodians, requestors, scientists, and reviewers are documenting Step 4 decisions and supporting a Re-identification Risk Assessment.

De-identification techniques

De-identification refers to technical and organisational approaches that reduce the likelihood that data can be associated with an identifiable person. Even without direct identifiers, data may still be personal information if it remains reasonably linkable.

Aggregation and suppression

Remove identifiers or overtly identifying fields, or reduce granularity so the data is less easily linkable to a person.

Generalisation

Replace precise values with broader categories, such as substituting exact dates of birth with age bands.

Pseudonymisation

Use cryptographically protected transformations such as keyed hashing with appropriate key management rather than plain hashing alone.

Perturbation

Introduce controlled change through noise addition, micro-aggregation, or data swapping to reduce disclosure risk.

Types of privacy risk

The appendix distinguishes among multiple disclosure risks. Real privacy assessment needs to account for all of them, not just direct re-identification.

Identity disclosure

A synthetic record can be confidently linked to a specific person. Direct identifiers should already be removed, but residual linkability can still matter.

Membership disclosure

An adversary can infer whether a specific person was included in the training dataset, which can itself be highly sensitive.

Attribute disclosure

An adversary can infer new sensitive information about an individual using synthetic data plus auxiliary knowledge they already hold.

Landscape of evaluation metrics

No single measure defines privacy safety. These methods should be read as lenses on risk, not bounded guarantees of privacy loss.

Categories of privacy metrics

Use multiple methods aligned to a realistic threat model.

Type	Category	Method	What it tells you
Non-adversarial	Re-identifiability	k-Anonymity	Checks whether each individual is indistinguishable from at least k - 1 other individuals against a set of quasi-identifiers.
Non-adversarial	Re-identifiability	l-Diversity	Extends k-anonymity by ensuring sensitive attributes within each anonymised group have at least l distinct values.
Non-adversarial	Re-identifiability	t-Closeness	Requires the distribution of a sensitive attribute in a group to remain close to the overall dataset distribution.
Non-adversarial	Memorisation and similarity	Hitting Rate (Common Row Proportion)	Measures the percentage of exact matching rows between the synthetic and source data.
Non-adversarial	Memorisation and similarity	Close Value Ratio	Assesses the probability of near matches using a distance threshold.
Non-adversarial	Memorisation and similarity	Similarity Ratio (epsilon-identifiability)	Tests whether fewer than an epsilon ratio of synthetic observations are similar enough to those in the original dataset.
Non-adversarial	Memorisation and similarity	Nearest Neighbour Accuracy	Evaluates proximity between source and synthetic distributions, but should be interpreted cautiously because similarity-based metrics can miss serious leakage.
Non-adversarial	Distinguishability	Data Likelihood	Measures the likelihood of synthetic data belonging to the source data distribution.
Non-adversarial	Distinguishability	Detection Rate	Measures how easily models can distinguish source data from synthetic data.
Adversarial	Singling out attacks	Singling Out Attack (Univariate)	Observes the uniqueness of a single attribute in the synthetic data.
Adversarial	Singling out attacks	Singling Out Attack (Multivariate)	Examines uniqueness across combinations of attributes.
Adversarial	Record linkage attacks	Public-Public Linkage	Uses the synthetic dataset to establish links between records found in two external datasets.
Adversarial	Record linkage attacks	Public-Synthetic Linkage	Links synthetic rows to an external dataset using matching criteria, creating a basis for inference attacks.
Adversarial	Attribute inference attacks	Exact Match AIA	Determines a missing target attribute by matching overlapping quasi-identifiers.
Adversarial	Attribute inference attacks	Closest Distance AIA	Infers a sensitive value using the nearest synthetic neighbour where k = 1.
Adversarial	Attribute inference attacks	Nearest Neighbours AIA	Uses the k nearest synthetic neighbours where k is greater than 1.
Adversarial	Attribute inference attacks	ML Inference AIA	Trains a predictive model on synthetic data to infer target attributes.
Adversarial	Membership inference attacks	Closest Distance MIA	Infers membership if a target record is more similar to synthetic data than to unrelated data.
Adversarial	Membership inference attacks	Nearest Neighbours MIA	Extends the closest-distance approach to proximity against multiple neighbours, but still inherits the limits of similarity-based methods.
Adversarial	Membership inference attacks	Probability Estimation MIA	Uses hypothesis testing to assess whether a target record belongs to the synthetic data distribution.
Adversarial	Membership inference attacks	MIA Shadow Model	Uses shadow models trained with and without the target record to classify membership.

Limitations of common metrics

Similarity-based metrics and average-case scores are useful for finding some problems, but they are poor proof of safety. Privacy is a worst-case question focused on whether any individual is exposed.

Similarity-based metrics

Measures such as nearest-neighbour similarity are intuitive, but research has shown they can miss serious privacy leakage and do not provide bounded privacy guarantees.

Average-case metrics like F1

Aggregate scores can hide a small group of highly vulnerable people. A high F1 score clearly signals a privacy failure, but a low score does not prove safety.

Differential Privacy and auditing

Differential Privacy

Differential Privacy is a property of the generation process, not the output dataset. Pure epsilon-DP is the strictest form, while approximate (epsilon, delta)-DP allows a small failure probability. Smaller epsilon values give stronger protection.

Audit claims empirically

Real-world implementations can fail because of design flaws, incorrect assumptions, or bugs. Empirical auditing is still required even when a generator claims formal privacy guarantees.

Canary-based auditing

Inject carefully constructed artificial records into training data, train the generator, then test whether those canaries are detectable or reconstructable in the output. Detectable canaries are concrete evidence of memorisation or leakage.

Practical considerations for privacy evaluation

The framework text emphasises context-aware evaluation and transparent reporting rather than a single mechanical checklist.

Base evaluations on realistic quasi-identifiers that reflect likely adversary knowledge.

Evaluate the entire dataset rather than only a pre-selected subset of records.

Assess both membership disclosure and attribute disclosure, not just one attack surface.

Empirically validate Differential Privacy claims, especially when the privacy budget is not close to zero.

Report results across multiple synthetic data generation runs and keep worst-case outcomes visible.

Future directions and open challenges

Better empirical privacy metrics that capture worst-case rather than average-case risk.
More practical, automated, and reproducible privacy auditing tools.
Clearer interpretation of epsilon and delta in operational settings.
Better handling of cumulative privacy loss across repeated synthetic data releases.
Stronger methods for time-series, longitudinal data, free text, and other complex data types.

The appendix concludes that privacy evaluation is not optional. Responsible practice depends on realistic threat modelling, transparent assumptions, empirical auditing, and a portfolio of complementary evidence to understand and manage residual risk.

Go to Step 4 Back to appendices

Type

Differential Privacy

Audit claims empirically

Real-world implementations can fail because of design flaws, incorrect assumptions, or bugs. Empirical auditing is still required even when a generator claims formal privacy guarantees.

Canary-based auditing

Future directions and open challenges

Better empirical privacy metrics that capture worst-case rather than average-case risk.

More practical, automated, and reproducible privacy auditing tools.

Clearer interpretation of epsilon and delta in operational settings.

Better handling of cumulative privacy loss across repeated synthetic data releases.

Stronger methods for time-series, longitudinal data, free text, and other complex data types.