Supplemental Information to the NIH Policy for Data Management and Sharing: Protecting Privacy When Sharing Human Research Participant Data
Notice Number:
NOT-OD-22-213

Key Dates

Release Date:

September 21, 2022

Related Announcements

NOT-OD-22-214 – Supplemental Information to the NIH Policy for Data Management and Sharing: Responsible Management and Sharing of American Indian/Alaska Native Participant Data

NOT-OD-21-013 – Final NIH Policy for Data Management and Sharing

NOT-OD-21-014 – Supplemental Information to the NIH Policy for Data Management and Sharing: Elements of an NIH Data Management and Sharing Plan

NOT-OD-21-015 – Supplemental Information to the NIH Policy for Data Management and Sharing: Allowable Costs for Data Management and Sharing

NOT-OD-21-016 – Supplemental Information to the NIH Policy for Data Management and Sharing: Selecting a Repository for Data Resulting from NIH-Supported Research

Issued by

Office of The Director, National Institutes of Health (OD)

Purpose

NIH promotes the responsible sharing of scientific data consistent with protecting research participant privacy. To advance efforts under its new Data Management and Sharing Policy (DMS Policy), NIH is providing supplemental information assisting researchers in addressing privacy considerations when sharing human research participant data. This information is not intended to provide a guide for compliance with regulatory requirements nor is it establishing binding rules for NIH awardees, but instead provides a set of principles, best practices, and points to consider for creating a robust framework for protecting the privacy of research participants when sharing data.

Background

Effective data stewardship and protection of human research participant (hereinafter “participant”) privacy are achieved in tandem through responsible scientific data sharing practices. Accordingly, NIH has developed supplemental information to the DMS Policy to assist researchers in responsible data sharing by establishing 1) operational principles for protecting participants’ privacy when sharing scientific data, 2) best practices for implementing these principles, and 3) points to consider for choosing whether to designate scientific data for controlled access.

The operational principles, best practices, and points to consider are intended to address the sharing of both identifiable and de-identified data, as well as data that have been obtained either with consent or where no consent was required. This information is not presented as part of a sequential process nor is it intended to be a how-to guide for de-identification. This supplemental information does not address the use of specimens in research or data security standards, although such standards may apply to research subject to the DMS Policy.[1] Researchers and institutions are also expected to follow all other applicable federal, Tribal, state, and local laws, regulations, and policies that govern research involving human participants and the sharing and use of scientific data derived from participants. For example, NIH has specific privacy expectations under the Genomic Data Sharing (GDS) Policy that should be followed for research subject to that Policy.[2] Data repositories may also establish specific requirements for submission of data.

NIH recommends that these operational principles, best practices, and points to consider be incorporated as early as possible into the research process. Prospective incorporation of this framework facilitates research planning and communication of plans to participants, as encouraged in the DMS Policy. DMS Plans can be updated as necessary with NIH approval. NIH encourages coordination and communication with NIH Program Officers about DMS Plans. Note: These operational principles, best practices, and points to consider may be updated in response to changes in laws, regulations, policies, technology, science, and other factors.

Overview of Public Comments

This supplemental information was developed to address comments on the draft DMS Policy requesting further clarity and direction for researchers and their institutions about NIH’s principles and preferred practices regarding privacy.[3] It was revised in response to comments received to the “Request for Public Comments on DRAFT Supplemental Information to the NIH Policy for Data Management and Sharing: Protecting Privacy When Sharing Human Research Participant Data” (NOT-OD-22-131).[4] NIH considered all feedback in the development of the final supplemental information. Changes made in response to the public comments are summarized below.

Clarifying Scope and Relationship to Existing Regulations

Many commenters requested additional clarification on how this supplemental information might impact existing privacy expectations (such as those provided by the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule and the Common Rule). For example, some commenters expressed a need for more specific instructions on addressing de-identified data under various regulatory standards as well as the potential differences in protecting data that have been obtained with informed consent and when consent was not required. There were also related concerns that the supplemental information could conflict with existing regulations or introduce new binding rules.

The supplemental information has been revised to clarify its scope. This information is not intended to create new binding rules, but rather should be used by institutions to inform their decisions about whether and how to share participant data. The DMS Policy does not change institutions’ obligations under the Common Rule, the HIPAA Privacy Rule, or other binding rules. Clarifications have been made throughout to more clearly indicate when NIH is referring to existing requirements that are relevant to privacy protections and de-identification. The supplemental information has been revised to clarify that it is intended to address both identifiable and de-identified data (and not specimens), as well as data that have been obtained either with consent or where no consent was required. The operational principle about informed consent has been revised to indicate that consent is not itself a privacy protection, and thus is not a main focus of the supplemental information, but that consent can impact privacy protections by helping to establish the conditions for future data sharing and use.

Burden of Institutional Review of Data Sharing and Use

Several commenters expressed concern that the proposal for institutional review of the conditions for data sharing and use would be overly burdensome. In particular, commenters objected to the likely burden if institutions have to certify that all datasets have been appropriately de-identified. Other commenters questioned how reviews would be implemented into existing oversight processes and which component of an institution should be responsible for oversight. Finally, multiple commenters requested that the NIH develop standardized templates for data sharing and use agreements.

The supplemental information is intended to address sharing scientific data under the DMS Policy, including through agreements that govern data sharing. However, NIH is not intending to address specific features of institutional review processes. As such, the supplemental information encourages a robust but flexible approach to institutional review of data sharing. The supplemental information has been revised to encourage institutional review primarily when sharing data broadly, such as through repositories. Both the operational principle and corresponding best practice describing institutional review have also been revised to emphasize that no specific office, process, or component of an institution is recommended to be used for institutional review. Reviews need only be conducted by those with appropriate expertise and institutional roles. NIH also agrees on the potential value for templates and standardized agreements for data sharing and use. The supplemental information refers to existing resources on templates, but NIH may develop additional templates that address the supplemental information and the DMS Policy.

Additional Details on De-Identification

Numerous commenters requested additional examples and more concrete guidance on how to appropriately de-identify data. In particular, commenters requested more details on the statistical methods for de-identification referenced in the section on Best Practices for Protecting Participant Privacy When Sharing Scientific Data. Some commenters suggested that documenting methods of de-identification and communicating those methods to others be added as a component under the best practices. There were also multiple commenters suggesting that “indirect identifiers” were not adequately addressed in the supplemental information. These commenters also suggested that indirect identifiers may pose particular challenges for behavioral research and sharing qualitative data.

Additional details and examples have been provided throughout the supplement, where possible. The supplemental information is not intended to be a how-to guide for de-identification. The supplemental information has been revised to include additional references to resources for de-identification. In response to comments about “indirect identifiers,” a component has also been added to the best practices on the challenges posed by information that can allow inferences to be made about a participant’s identity, even if the information is de-identified according to HIPAA or the Common Rule.

Understanding Certificates of Confidentiality

A number of commenters expressed the view that Certificates of Confidentiality remain untested and that their implementation is unclear. Given these potential limitations, these commenters were concerned that Certificates may not adequately protect scientific data. Other commenters noted that there are many additional legal protections beyond Certificates of Confidentiality that also warrant more detailed discussion. Lastly, some commenters suggested that emphasis should be placed on communicating the application of Certificates (e.g., to repositories) to ensure proper downstream protections.

While Certificates may be largely untested, they remain important safeguards against unauthorized disclosure of identifiable information used in research. Certificates are emphasized in the supplemental information because NIH has a Certificates of Confidentiality Policy that may apply to some types of shared data. The supplemental information has been revised to include references to other existing, applicable regulations. The discussion of Certificates has also been revised to emphasize the importance of communicating to repositories and other downstream users when a Certificate applies. Finally, the discussion has been revised to emphasize that Certificates are a legal requirement in many cases.

Data Not Subject to Typical Research Protections

Some commenters suggested that the application of “strict privacy considerations” to data from “non-traditional research contexts” was confusing. They noted it was unclear whether this meant that additional protections were required, beyond those applied to typical research contexts. It was also unclear why the examples mentioned counted as “non-traditional” and how researchers would be able to apply protections to data that were already publicly available.

The principle has been revised to better reflect the underlying concept, which is that all data used in research deserve to be protected, regardless of their original source. Even data that are publicly available deserve protections in research, because of the unanticipated ways in which researchers may use them.

Clarification on Sharing Data Openly

While there was general support for the points to consider concerning controlled access, some commenters requested corresponding guidance on when to share data openly.

The supplemental information is primarily designed to help people think through situations that might involve the need to protect data. However, NIH agrees it is helpful to have guidance on when additional protections or controls are unnecessary. The supplemental information has been revised to provide factors to consider when determining whether to share scientific data without access controls, which include if individuals explicitly consented to open sharing of their data and if data are de-identified and institutional review has determined that the data are very low risk.

Sharing Sensitive Data Through Controlled Access

A number of commenters requested a different approach to the treatment of sensitive data. Some commenters suggested that other harms should be considered in the definition of “sensitive,” including the potential for community harms and security threats. The previously mentioned concerns about “indirect identifiers” were also mentioned in this context, with commenters explaining that the presence of such information can increase sensitivity. Finally, some commenters encouraged a more nuanced approach that would allow for sensitive data to be shared in certain circumstances. These commenters expressed concern that marginalized groups would be excluded from research just because their data are sensitive. With appropriate data management, these commenters suggested, it may be possible to share sensitive data appropriately.

The discussion of sensitive data in the supplemental information is not meant to be exhaustive of all the factors that could indicate the need for controls. The framing is intended to provide flexibility for protecting sensitive data, depending on the research context and the impact on groups and communities. As such, no revisions have been made to the framing of what may be “sensitive.” NIH agrees that sensitive data can be shared in certain circumstances with appropriate protections. The supplemental information has been revised to clarify that sensitive data may be able to be shared with adequate de-identification and that researchers should engage with communities affected by sharing sensitive data to help determine appropriate protections. NIH also acknowledges that additional consideration is needed for information that allows inferences to be made about a participant’s identity when combined with other information. The supplemental information has been revised to indicate the need for taking such information into account when considering designating scientific data for controlled access.

Supplemental Information to the NIH Policy for Data Management and Sharing: Protecting Privacy When Sharing Human Research Participant Data

Operational Principles for Protecting Participant Privacy When Sharing Scientific Data

Respect for and protection of participant privacy is fundamental to the biomedical and behavioral research enterprise. NIH and the institutions it funds must protect the privacy and confidentiality of every participant as described in applicable informed consent and in line with all applicable laws, regulations, and policies. In developing a Data Management and Sharing Plan for NIH-funded or supported research, it is paramount that researchers uphold the following principles in their Plans and throughout the research project.[5]

1. Proactive assessment of protections. Researchers and institutions should proactively assess the protections needed for sharing scientific data from participants, including determining whether sharing should be restricted through controlled access (see section on Points to Consider for Choosing Whether to Designate Scientific Data for Controlled Access).[6] Privacy protections should be considered regardless of whether the data meet technical and/or legal definitions of “de-identified” and can legally be shared without additional protections (e.g., if the data are being used without informed consent because the research does not meet the definition of “human subjects research” under the Common Rule).

2.Clear communication of data sharing and use in consent forms. Researchers and institutions should develop robust consent processes that prioritize clarity regarding future sharing and use of scientific data, including limitations on future use, and general aspects regarding how data will be managed (see Informed Consent for Secondary Research with Data and Biospecimens: Points to Consider and Sample Language for Future Use and/or Sharing).[7] While informed consent is not itself a privacy protection, it does provide the opportunity to establish the conditions for sharing and using scientific data (including whether data can be shared openly or should be shared through controlled access). Scientific data that are collected, shared, or used without informed consent also deserve privacy considerations.

 

3.Consideration of justifiable limitations to sharing data. There may be justifiable limitations to sharing scientific data under the DMS Policy. The DMS Policy outlines factors that might limit sharing, including when sharing would compromise the privacy or safety of participants and when limitations are explicitly described in informed consent documents.[8] In these instances, researchers should outline these justifications in their Data Management and Sharing Plans. In addition, limitations on sharing and use should be conveyed with the data when they are transferred, such as when sharing through repositories to downstream users (see section on Best Practices to Establish Scientific Data Sharing and Use Agreements).

 

4. Institutional review of the conditions for data sharing. Institutions should review the conditions for sharing data, including that proposed limitations on the future use of data are appropriate and that risks have been considered, and communicate this information to repositories and/or users (see section on Best Practices to Establish Scientific Data Sharing and Use Agreements). Such review helps establish the conditions under which future sharing will occur and enables consistent, clear, and appropriate sharing with downstream users. Review can take different forms and be conducted by different offices or components of an institution (such as an Institutional Review Board, Privacy Board, or individuals with appropriate roles and expertise).

 

5. Protections for all data used in research. Scientific data used in research warrant privacy considerations regardless of whether the data are collected from non-research settings or settings that may be subject to different privacy standards than traditionally applied to research data, such as from social media and public health surveillance. Even if researchers cannot set the standards for collecting such data, they should apply protections for sharing scientific data consistent with those outlined in this supplemental information.

 

6. Remaining vigilant regarding data misuse. Responsible data sharing practices require a commitment from the entirety of the biomedical and behavioral research enterprise. Researchers and institutions should remain vigilant regarding potential misuse and work in concert with NIH to prevent unauthorized use of scientific data from NIH-supported repositories. In addition, NIH is committed to enforcing the terms of its data use agreements.

Best Practices for Protecting Participant Privacy When Sharing Scientific Data

NIH acknowledges there are multiple, effective strategies for achieving privacy protection in the context of the DMS Policy. Building upon the operational principles described above, the following best practices, when implemented together, along with consideration of the Points to Consider for Choosing Whether to Designate Scientific Data for Controlled Access (below), provide a robust privacy framework.

1. Apply Appropriate De-identification. NIH recommends scientific data be de-identified to the greatest extent that maintains sufficient scientific utility. Unless participants explicitly consent to sharing identifiable data (e.g., under the broad consent provision of the Common Rule[9]), data should generally be shared only in a de-identified format. Researchers and institutions should consider the following strategies and their appropriateness given their particular research and scientific data:

  • Rely on the standards for identifiability outlined in both the Common Rule[10] (e.g., participant identity cannot “readily be ascertained by the investigator”) and in the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule (i.e., Expert Determination[11] or Safe Harbor[12]), regardless of whether these rules apply to the sharing, disclosure, or subsequent use of data. Under HIPAA’s Expert Determination standard, researchers can employ advanced statistical or computational methods to de-identify data and maintain privacy. Researchers can find additional resources to assist with de-identification through guidance on Expert Determination and Safe Harbor provided by the Department of Health and Human Services, NIST’s tools for de-identification, guidance on images from the National Cancer Imaging Archive, and the National Library of Medicine’s clinical text de-identification tool (NLM-Scrubber).[13]
  • There may be privacy risks associated with sharing information that is not considered identifiable by applying the Common Rule and HIPAA standards. For example, information can be present in data, even when de-identified to the standard of the Common Rule or the HIPAA Privacy Rule’s Safe Harbor, that can allow inferences to be made about a participant’s identity when combined with other information (e.g., educational, employment, or general medical history information). Data from qualitative research may pose particular challenges in removing information that may allow such inferences. Researchers could consider options such as modifying this information and/or sharing data only through controlled access.
  • Methods used to de-identify scientific data should be documented for communicating to downstream users.
  • In some cases, scientific utility may be lost if shared data are de-identified. It may consequently be justifiable in certain cases to share scientific data under the DMS Policy that meet a legal or regulatory standard for identifiability.[14] It is generally acceptable to share identifiable data when participants provide their explicit consent to do so (in addition to meeting other applicable legal or regulatory requirements for sharing identifiable data).

2. Establish Scientific Data Sharing and Use Agreements. NIH recommends the use of scientific data sharing and/or use agreements, preferably standardized, when sharing data through repositories as proposed in Data Management and Sharing Plans.[15] Agreements for sharing data through repositories are recommended, as they establish the conditions that enable consistent, clear, and appropriate sharing with downstream users. Agreements are also important for users of controlled-access data to promote common understanding of responsibilities and expectations in use of participant data. Agreements should be considered even if scientific data are de-identified. Key elements that promote the privacy of participants in such agreements include:

  • Oversight. Agreements for submitting data to repositories should include assurance that an institutional oversight body has reviewed and considered the risks of data sharing, that sharing is consistent with informed consent (as applicable), and that the protections in place are appropriate (such as de-identification, including the standards and methods used). There is no particular institutional office or component that is recommended to conduct these reviews, as long as the individual(s) involved possess the appropriate expertise and institutional role(s).
  • Responsibilities. Agreements for data users should delineate responsibilities of all parties having access to the data and clearly inform parties about data use limitations as well as responsibilities regarding privacy and confidentiality, including those required by Certificates of Confidentiality,[16] as applicable.
  • Restrictions. Agreements should explicitly outline sharing limitations and explicitly prohibit attempts to re-identify and/or recontact participants or their family members unless there is explicit agreement to do so. Such restrictions should be communicated to all users and managers of the data. Methods used to de-identify data and any relevant risk assessments should also be communicated with the data, to enable downstream users to understand whether applied methods are sufficiently protective for downstream uses.

3. Understand and Communicate Legal Protections Against Disclosure and Misuse. A variety of federal, Tribal, state, and local laws impose obligations on the disclosure and use of scientific data from research (including HIPAA and the Common Rule, mentioned above, as well as state laws that may prohibit disclosure of certain types of information). Researchers and their institutions should understand the applicability of relevant laws, regulations, and policies on their research.

  • Researchers and institutions are particularly encouraged to understand the requirements and legal protections provided by the NIH Certificates of Confidentiality Policy.[17] Recipients of data, including repositories, should be informed when scientific data are covered by a Certificate, and should be reminded that such data and all copies are covered by Certificates in perpetuity. Certificates of Confidentiality protect the privacy of research participants by prohibiting disclosure of protected information for non-research purposes to anyone not connected with the research except in specific situations, such as when there is consent to do so.

Points to Consider for Choosing Whether to Designate Scientific Data for Controlled Access

The DMS Policy expects researchers to consider whether access to scientific data from participants should be controlled (i.e., measures such as requiring data requesters to verify their identity and the appropriateness of their proposed research use to access protected data), even if de-identified and lacking explicit limitations on subsequent use.[18] The points below are intended to assist researchers when considering whether controlled-access repositories may be needed to protect participant privacy.[19] Note that controls may be needed for data at any level of processing (e.g., raw or fully cleaned data), from any source (e.g., research, clinical, or public health data), and for all types of research data (e.g., quantitative, qualitative, imaging, sensor-based). The framework provided by the operational principles and best practices should still be considered when deciding whether to designate scientific data for controlled access. Researchers should consider sharing participants’ scientific data through controlled-access repositories if data:

  1. Have explicit limitations on subsequent use, such as those imposed by laws, regulations, policies, informed consent, and agreements.

2. Could be considered sensitive, such as including information regarding potentially stigmatizing traits, illegal behaviors, or other information that could be perceived as causing group harm or used for discriminatory purposes. Sensitive data may also include data from individuals, groups, or populations with unique attributes that increase the risk of re-identification. Even if data are sensitive, it may be possible to de-identify the data in ways that would allow appropriate sharing. When possible, researchers are encouraged to engage with communities affected by sharing sensitive data to discuss approaches for appropriate use and risk mitigation.

3. Cannot be de-identified to established standards or for which the possibility of re-identification cannot sufficiently be reduced. For example, datasets de-identified to regulatory standards that nonetheless pose risks due to information that can still allow inferences to be made about participants (discussed above in the Best Practice on De-identification) may not be able to be shared openly. Access controls, among other measures, may be appropriate to further mitigate the risk of re-identification.[20]

4. Due to previously unanticipated approaches or technologies that become known, pose risks to participant privacy if released without controls on access. When such risks are identified prior to sharing the scientific data and not outlined in original Data Management and Sharing Plans, any changes to Data Management and Sharing Plans should be communicated to NIH consistent with the DMS Policy.

In certain cases, it may be appropriate to share scientific data without access controls. Factors to consider when choosing whether to share data openly include the following:

1. Participants explicitly consent to share scientific data openly without restrictions.

2. Scientific data are de-identified and institutional review has determined that they pose very low risk when shared and used, including any risks posed by the presence of information that can allow inferences to be made about a participant’s identity when combined with other information.

References

[1] Relevant standards and policies that may apply include the HHS Policy for Preparing for and Responding to a Breach of Personally Identifiable Information (PII) (https://www.hhs.gov/web/governance/digital-strategy/it-policy-archive/hhs-policy-preparing-and-responding-breach.html) and the National Institute of Standards and Technology’s (NIST) Special Publications on Computer Security (https://csrc.nist.gov/publications/sp800).

[2] NIH Genomic Data Sharing Policy. https://grants.nih.gov/grants/guide/notice-files/not-od-14-124.html

[3] Compiled Public Comments on a DRAFT NIH Policy for Data Management and Sharing and Supplemental DRAFT Guidance. https://osp.od.nih.gov/wp-content/uploads/RFI_Final_Report_Feb2020.pdf

[4] Request for Public Comments on DRAFT Supplemental Information to the NIH Policy for Data Management and Sharing: Protecting Privacy When Sharing Human Research Participant Data. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-22-131.html

[5] NIH’s recommended considerations and best practices for responsible sharing of American Indiana/Alaska Native data under the DMS Policy can be found in the Supplemental Information to the NIH Policy for Data Management and Sharing: Responsible Management and Sharing of American Indian/ Alaska Native Participant Data.(NOT-OD-22-214))  ()

[6] “Controlled access” and “access controls” refer to measures such as requiring data requesters to verify their identity and the appropriateness of their proposed research use to access protected data.

[7] Informed Consent for Secondary Research with Data and Biospecimens: Points to Consider and Sample Language for Future Use and/or Sharing. https://osp.od.nih.gov/wp-content/uploads/Informed-Consent-Resource-for-Secondary-Research-with-Data-and-Biospecimens.pdf

[8] FAQ on justifiable reasons for limiting sharing of data under the DMS Policy: https://sharing.nih.gov/faqs#/data-management-and-sharing-policy.htm?anchor=56549.

[9] 45 CFR 46.116(d)

[10] 45 CFR 46.102(e)(5)

[11] 45 CFR 164.514(b)(1)

[12] 45 CFR 164.514(b)(2)

[13] Resources listed here include guidance on de-identification from the Department of Health and Human Services (https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html), NIST’s tools for de-identification (https://www.nist.gov/itl/applied-cybersecurity/privacy-engineering/collaboration-space/focus-areas/de-id), guidance on images from the National Cancer Imaging Archive (https://wiki.cancerimagingarchive.net/display/Public/Submission+and+De-identification+Overview), and the National Library of Medicine's clinical text de-identification tool (https://lhncbc.nlm.nih.gov/scrubber/).

[14] Final NIH Policy for Data Management and Sharing. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html

[15] As an example of a resource for community developed, standardized templates for data transfer and use agreements, see the Federal Demonstration Partnership. https://thefdp.org/default/committees/research-compliance/data-stewardship/. Note that not all templates and agreements may meet all principles outlined in this supplemental information, and that other templates and agreements may be developed in the future.

[16] Certificates of Confidentiality. https://grants.nih.gov/policy/humansubjects/coc.htm

[17] Certificates of Confidentiality. https://grants.nih.gov/policy/humansubjects/coc.htm

[18] See the Supplemental Information to the NIH Policy for Data Management and Sharing: Selecting a Repository for Data Resulting from NIH-Supported Research. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-016.html

[19] Preferred repositories may be specified in Funding Opportunity Announcements or through NIH Institute and Center policy expectations.

[20] Other risk-mitigation measures that repositories can employ are listed in Section II of the Supplemental Information to the NIH Policy for Data Management and Sharing: Selecting a Repository for Data Resulting from NIH-Supported Research. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-016.html.

Inquiries

Please direct all inquiries to:

NIH Data Management and Sharing Policy inbox
sharing@nih.gov