SynD Framework logoSynthetic Health Data Governance Framework (SHDGF)
Home
ResourcesAbout SynD
1
Use Case
2
Source Data
3
Generate
4
Re-ID Risk
5
Safety
6
Final

Step 2 - Assess & Prepare Source Data

Determine whether the original dataset is suitable for synthetic generation. Apply governance, data quality and privacy checks before data access.

DPDRDS

Governance Intent

Confirm lawful availability, readiness, and data quality before synthesis.

Decisions

Is source data lawfully owned and available for synthesis?

Is linkage to external datasets required?

Is data quality and representativeness adequate?

Why This Step

Ensure source data quality and fitness for purpose before synthesis. Poor quality input leads to poor quality synthetic data.

Prerequisites

  • Access to source data
  • Data dictionary or schema
  • Understanding of data collection methods

Time Estimate

45-90 minutes

Quality matters
Synthetic data can only be as reliable as the inputs. Address data quality issues before continuing to synthesis.

Data Quality Assessment

Evaluate readiness across core quality dimensions.

Fitness for Purpose

Confirm the dataset can support the intended synthesis use case.

Required Evidence

  • Ownership confirmation
  • Profiling report (Appx 6)
  • Cross-custodian agreements

Step completion requirements

0 / 9 complete

Finish these before marking the step complete:

  • Data completeness assessed
  • Data accuracy verified
  • Data consistency checked
  • Data timeliness evaluated
  • Representativeness of source data confirmed
  • And 4 more required items.

Resources

  • Five Safes Framework (Appendix 10)
  • Technical Assessment Template (Appendix 6)
  • De-identification techniques and privacy evaluation in synthetic data (Appendix 7)
Previous StepOverviewNext Step