November 30, 2021
NOT-OD-14-124 - NIH Genomic Data Sharing Policy
NOT-OD-21-013 - Final NIH Policy for Data Management and Sharing
Office of The Director, National Institutes of Health (OD)
NIH is seeking public input on potential updates to the NIH Genomic Data Sharing Policy to keep pace with evolving scientific opportunities and stakeholder expectations.
The NIH Genomic Data Sharing (GDS) Policy (NOT-OD-14-124), issued in 2014, set forth expectations for ensuring the broad, responsible, and timely sharing of genomic research data generated from NIH-funded or conducted research. A landmark policy at the time, the GDS Policy focused on striking an appropriate balance between accelerating scientific research through rapid genomic data sharing and minimizing risk through formalizing expectations of informed consent and appropriate privacy protections. The GDS Policy has served the research community well, facilitating tens of thousands of genomics studies while preserving public trust in the biomedical research enterprise.
While the principles underlying the GDS Policy remain relevant for research today, genomic sequencing and related technologies are now considered integral to the conduct of biomedical research. Moreover, data sharing is widely recognized as a best practice for advancing research and the promise of societal benefit continues to evolve. While NIH has adjusted implementation of the GDS Policy to keep pace with these changes, several key developments affecting the conduct of NIH-supported genomic research warrant reassessment of aspects of the GDS Policy. These developments include:
NIH remains committed to the principles espoused by the GDS Policy of maximizing scientific advances and public benefit by sharing genomic data and associated phenotypic data in a manner consistent with participants’ informed consent. However, in an effort to ensure NIH policies keep pace with evolving scientific opportunities and stakeholder expectations, NIH is seeking public feedback on how to ensure the GDS Policy remains consistent with this changing landscape. Note that while potential updates are under consideration, the GDS Policy will remain in effect in current form until further notice.
Request for Input
Respect for and protection of the interests of research participants are central tenets of the NIH GDS Policy and are fundamental to NIH’s stewardship of large-scale genomic data. Data derived from human research participants under the GDS Policy must be de-identified and provided with a random, unique code, the key to which is held by the submitting institution. NIH acknowledges that the concept of “identifiability” is a matter of ongoing deliberation within the scientific and bioethics communities. NIH relies on robust protections beyond de-identification, such as Institutional Review Board (IRB) consideration of risks associated with data submission, designating controlled access for certain data types, use of Data Access Committees to review requests, data use agreements to prohibit data disclosure and participant re-identification, and Certificates of Confidentiality[ii] to prohibit disclosure. As outlined in the NIH GDS Policy, the criteria for establishing de-identification are:
The reliance on the 18 identifiers enumerated at 45 CFR 164.514(b)(2) (the HIPAA Privacy Rule) as the only acceptable method under the GDS Policy for de-identification has recently presented several challenges. Certain data elements considered potentially identifiable, such as date ranges shorter than a year, may have scientific utility, especially when studying disease progression (e.g., with COVID-19) or higher resolution location data than the regulatory standard (e.g., full ZIP codes or mobile location data), which may be valuable for studying the social determinants of health or environmental risk.
Challenges have also arisen recently around data linkage. It is difficult to know in advance which data sources may add scientific value when combined, so it is not always possible to tell participants about data linkage during their initial consent. Linking data refers to connecting two or more data sources (often multiple studies) to bring together information about a person, enabling researchers to learn more about a participant or small group of participants. For example, a participant might enroll in a study that uses their electronic health record as well as a separate study that uses a sample of their blood, and the data about them from those studies could later be linked in new research for more powerful analyses. This challenge in prospectively informing participants about data linkage raises questions about respecting individuals’ autonomy and what participants understand about how their data will be used. Furthermore, data from multiple sources may not have been obtained under the same consent and de-identification expectations as the GDS Policy.
NIH seeks input on:
The rapid advance of genomic technologies, available at increasingly accessible cost, has enabled a wealth of large-scale genomic data and other associated data types. NIH has traditionally provided substantial capacity to the community for storing and managing access to human genomic data under the GDS Policy through dbGaP and a small number of other NIH-operated repositories.
To reduce the technical burden of analyzing genomic data, NIH has begun investing in a number of resources (i.e., beyond dbGaP) for storing, sharing, and analyzing human genomic and phenotypic data under the GDS Policy. These investments have resulted in an increasingly federated landscape of platforms and repositories, hosted both at NIH and awardee institutions. There is consequently a need to establish shared principles between NIH and external organizations that are supported by NIH to ensure that data protections are consistent with those provided by dbGaP and the terms of the GDS Policy.
Accordingly, NIH proposes principles derived from the GDS Policy and dbGaP practices that have been used as criteria to ensure that NIH-supported alternative resources hosting human data generated and shared under the GDS Policy maintain appropriate standards and protections. Note that these principles would provide expectations only for NIH-supported resources, and NIH is not proposing at this time that sharing of human genomic data in non-NIH-supported repositories or platforms would satisfy the GDS Policy’s expectations. These principles are also intended to be consistent with the criteria described in the supplemental information to the DMS Policy, “Selecting a Repository for Data Resulting from NIH-Supported Research” (NOT-OD-21-016). The principles include the following:
NIH seeks input on:
5. Data management and sharing principles for NIH-supported resources
In October 2020, NIH released the NIH Policy for Data Management and Sharing (DMS Policy) to promote the management and sharing of scientific data generated from NIH-funded or conducted research. Please note that while it was released October 2020, it is not effective until January 25, 2023. The framework for DMS Plan submission and review, as well as specific considerations for data managing and sharing practices, shall also be the default practice for those proposing research that is subject to the GDS Policy. To this effect, NIH intends to harmonize the GDS Policy and GDS Plan elements, submission, and review with the DMS Policy. Harmonization of the GDS and DMS Policies will ensure consistency in data sharing and management expectations, reduce administrative burden on the scientific community, and streamline and enhance compliance with NIH data sharing policies, while maintaining the principles of sharing large-scale genomic research data and protecting research participants’ interests and privacy.
To harmonize these policies, NIH proposes to make the following changes to the GDS Policy, GDS Plans, and GDS Plan submission and review:
In some cases, the GDS Policy’s earlier timelines for sharing have posed challenges for compliance. NIH seeks comment on harmonizing these timeline expectations by modifying GDS Policy expectations to be the same as DMS Policy expectations (i.e., no later than the time of publication or end of the performance period for unpublished data, whichever comes first). NIH Institutes, Centers, and Offices (ICOs) and programs will continue to be able to set earlier timelines for data sharing for specific projects if warranted.
This approach may have the consequence of sharing less data due to the definition of "scientific data" under the DMS Policy, which focuses on data of sufficient quality to validate and replicate findings, rather than the more expansive definition provided in the supplemental information to the GDS Policy. Through this change, NIH seeks to simplify compliance and focus sharing expectations on non-human genomic data that underlie research findings. While NIH ICOs and programs would be free to set more stringent expectations, NIH seeks input on the potential negative impacts of sharing non-human data consistent only with the DMS Policy.
Elements that will remain in the GDS Policy: The following expectations would remain in the GDS Policy because they achieve particular goals that are more specific than those outlined in the DMS Policy:
Additional changes to these provisions may be made to clarify or simplify language, harmonize with DMS Policy terminology or practices, or to reflect comments received from this request for information or other sources.
NIH seeks input on:
6. Harmonizing GDS and DMS Policies. Any aspect of the approach to harmonize GDS and DMS Policies and Plans described above, including for non-human genomic data.
7. GDS and DMS data sharing timelines. Whether the continued use of earlier submission expectations for human genomic data in the GDS Policy (e.g., submission of human data within three months of data generation) is needed, or whether timelines should be harmonized with the DMS Policy expectations (i.e., sharing of data no later than the time of publication or at the end of the performance period, whichever comes first), as described in the proposal above.
NIH recognizes that data types and analytical methods have advanced since the release of the GDS Policy in 2014. In some cases, non-genomic data types (e.g., proteomic and metabolomic data) may pose similar risks of re-identification as large-scale human genomic data and may warrant the additional protections afforded by the GDS Policy, such as the Policy’s specific de-identification expectations. Furthermore, institutions submitting human genomic data to NIH repositories are to review associated informed consent materials via IRBs or equivalent bodies and provide an Institutional Certification to the funding NIH ICO. These same protections or sharing expectations could potentially be applied to other specific high-value and/or potentially sensitive data types. Additionally, because the scope of the GDS Policy (e.g., large-scale) does not apply to certain studies, the protections of the GDS Policy discussed here are not uniformly applied.
With the implementation of the DMS Policy, NIH will soon expect researchers to maximize appropriate sharing of scientific data. However, some of the GDS Policy’s expectations for the level of data to be shared and the speed of data sharing will go beyond those expected under the DMS Policy. While the DMS Policy outlines the scope of data sharing in terms of those data needed to validate and replicate research findings, the GDS Policy refers to the submission of large-scale genomic data and associated phenotypic data based on level of processing. The value of this volume of data, and its potential reuse for a multitude of additional analyses, were key factors in establishing this sharing expectation, but as large-scale data become more common, there may be other data types that possess similar value for advancing NIH’s mission.
As stated in the GDS Policy, “[a]t appropriate intervals, NIH will review the types of research to which this Policy may be applicable.” As such, NIH seeks input on whether the protections of the GDS Policy should apply to research involving additional data types, and whether the expectations for the level and speed of data sharing are warranted for such research that would not otherwise be satisfied by the DMS Policy’s expectations.
As stated in the GDS Policy’s preamble, the Policy applies to research funded in part or in whole by NIH if NIH funding supports the generation of the genomic data. To ensure collaborations are consistent with the Policy’s goals, NIH seeks comment on clarifying that the GDS Policy applies to research funded in part or in whole by NIH that generates large-scale genomic data, even if NIH does not directly support the sequencing itself.
NIH seeks input on:
8. Types of research covered by the GDS Policy.
9. Data sharing expectations under the GDS Policy. Whether there are other types of research and/or data that warrant the data processing level and timeline expectations established by the GDS Policy (e.g., sharing lower levels of processed data, not just those of sufficient quality to validate and replicate findings as in the DMS Policy).
How to Submit a Response
Comments must be submitted at https://osp.od.nih.gov/rfi-updating-the-nih-genomic-data-sharing-policy. Responses will be accepted through February 28, 2022.
Responses to this RFI are voluntary and may be submitted anonymously. You may also voluntarily include your name and contact information with your response. Other than your name and contact information, please do not include in the response any personally identifiable information or any information that you do not wish to make public. Proprietary, classified, confidential, or sensitive information should not be included in your response. After OSP has finished reviewing the responses, the unredacted responses may be posted to the OSP website.
[i] Final NIH Policy for Data Management and Sharing (October 29, 2020). https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html
[ii] Notice of Changes to NIH Policy for Issuing Certificates of Confidentiality (September 7, 2017). https://grants.nih.gov/grants/guide/notice-files/NOT-OD-17-109.html
[iii] NIH Genomic Data Sharing Policy (August 27, 2014). https://grants.nih.gov/grants/guide/notice-files/NOT-OD-14-124.html
[iv] NIH Security Best Practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing (GDS) Policy (March 9, 2015). https://osp.od.nih.gov/wp-content/uploads/NIH_Best_Practices_for_Controlled-Access_Data_Subject_to_the_NIH_GDS_Policy.pdf
[v] NIST Risk Management Framework. Federal Information Security Modernization Act (FISMA) Background. https://csrc.nist.gov/projects/risk-management/fisma-background
[vii]Supplemental Information to the National Institutes of Health Genomic Data Sharing Policy (August 27, 2014). https://osp.od.nih.gov/wp-content/uploads/Supplemental_Info_GDS_Policy.pdf
[viii] The DMS Policy defines scientific data as “The recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications. Scientific data do not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens.”
[ix] Institutional Certifications. https://osp.od.nih.gov/scientific-sharing/institutional-certifications/
NIH Office of Science Policy