NCHS Public-use Synthetic Linked Data

NCHS Linked Data

Key points

  • The National Center for Health Statistics (NCHS) conducted a pilot project to create public-use synthetic linked data files.
  • Synthetic data are designed to maintain the properties of restricted-use data while protecting survey respondents' privacy.
  • NCHS has developed a verification process so researchers can confirm results based on synthetic linked data.
Image shows two swarms of dots, one purple and one green. Each swarm represents a data stream. They are converging.

Overview

The National Center for Health Statistics (NCHS) links survey data with administrative data from multiple government agencies to create new data resources that help scientists and policymakers answer complex health questions.

Linking two or more data sources also increases the risk of confidential data being identified. Because of this risk, linked data must usually be accessed through the NCHS Research Data Center network.

To make linked data easier to access, NCHS has piloted a project that creates synthetic linked data files. Synthetic data are created using statistical techniques to maintain properties of the original data.

NCHS has created a verification process so that researchers can confirm synthetic data results against the original data.

Synthetic linked 2018 NHIS-HUD-CMS data

This synthetic linked file includes selected variables from the 2018 National Health Interview Survey (NHIS) with selected linked data from the Centers for Medicare and Medicaid Services (CMS) Medicare health insurance records and participation in the Department of Housing and Urban Development (HUD) federal housing assistance programs.

Researchers can use the data on this file to study the effect of housing assistance on hospitalizations and other types of healthcare use.

Verification of synthetic data results

To request that NCHS verify your synthetic data results, email a completed Template for Requesting Verification Metrics for the Synthetic Linked NHIS-HUD-CMS Data to datalinkage@cdc.gov.