NCHS Public-use Synthetic Linked Data
Key points
- The National Center for Health Statistics (NCHS) conducted a pilot project to create public-use synthetic linked data files.
- Synthetic data are designed to maintain the properties of restricted-use data while protecting survey respondents’ privacy.
- NCHS has developed a verification process so researchers can confirm results based on synthetic linked data.
Overview
The (NCHS) links survey data with administrative data from multiple government agencies to create new data resources that help scientists and policymakers answer complex health questions.
Linking two or more data sources also increases the risk of confidential data being identified. Because of this risk, linked data must usually be accessed through the NCHS network.
To make linked data easier to access, NCHS has piloted a project that creates synthetic linked data files. Synthetic data are created using statistical techniques to maintain properties of the original data.
NCHS has created a verification process so that researchers can confirm synthetic data results against the original data.
Synthetic linked 2018 NHIS-HUD-CMS data
This synthetic linked file includes selected variables from the 2018 National Health Interview Survey (NHIS) with selected linked data from the Centers for Medicare and Medicaid Services (CMS) Medicare health insurance records and participation in the Department of Housing and Urban Development (HUD) federal housing assistance programs.
Researchers can use the data on this file to study the effect of housing assistance on hospitalizations and other types of healthcare use.
Data files and data dictionaries
- Synthetic linked 2018 NHIS-HUD-CMS data dictionary [PDF – 240 KB]
- Software programs:
Synthetic data methods and analytic support
Verification of synthetic data results
To request that NCHS verify your synthetic data results, email a completed Template for Requesting Verification Metrics for the Synthetic Linked NHIS-HUD-CMS Data to datalinkage@cdc.gov.