Update to NIH Management of Genomic Summary Results Access

Notice Number: NOT-OD-19-023

Key Dates
Release Date: November 01, 2018

Related Announcements


NOT-OD-17-110
NOT-OD-18-104
NOT-OD-07-088
NOT-OD-12-136
NOT-OD-14-124
NOT-OD-17-044
NOT-OD-22-198

Issued by
National Institutes of Health (NIH)

Purpose

The National Institutes of Health (NIH) is committed to advancing scientific discoveries while safeguarding the interests of study participants and maintaining public trust in biomedical research. Genomic summary results (GSR), previously referred to as aggregate genomic data or genomic summary statistics, are generated from primary analyses of genomic research. They convey information relevant to genomic associations with traits or diseases across datasets rather than associations specific to any one individual research participant. Responsible sharing of GSR generated through the analysis 1 of NIH-supported research promotes maximum public benefit from the federal research investment by providing information crucial to the interpretation and application of genomic data in research and clinical practice.

In 2007, NIH issued a Policy for Sharing Data Generated through NIH-supported Genome Wide Association Studies (NIH GWAS Policy), and launched the Database of Genotypes and Phenotypes(dbGaP). In 2014, the NIH GWAS Policy was subsumed under the NIH Genomic Data Sharing (GDS) Policy, which applies to all large-scale genomic data generated from NIH-funded research. Under both policies, dbGaP and other NIH-designated data repositories have provided data through a two-tiered system of unrestricted or controlled-access. GSR were originally included among the summary-level information available through unrestricted access. However, a 2008 paper by Homer et al. demonstrated a statistical method with the potential to resolve an individual’s inclusion as a member of a research group (e.g., within a disease group) using GSR, as long as one had access to that individual’s whole genome information. NIH responded to this development by moving GSR into controlled-access portions of NIH-designated data repositories 2 The agency stated at that time its intent to further assess the risks and benefits associated with unrestricted access to this type of information in light of the new methodology.

Since 2008, NIH has continued to consider the risks and benefits of access to GSR. Specifically, the agency has hosted two workshops and solicited public comments through requests for information on the topic. In 2012, the NIH held a workshop entitled Establishing a Central Resource of Data from Genome Sequencing Projects to consider a wide scope of issues related to aggregating genomic data. In 2016, the National Human Genome Research Institute convened a workshop entitled Sharing Aggregate Genomic Data to explicitly reconsider the risks and benefits associated with access to and use of GSR. NIH also solicited broad input on this topic in a Request for Information (RFI) in 2017. Based on feedback obtained through the workshop discussions and RFI, NIH developed a proposal to update the management of GSR in NIH-designated data repositories to allow broader access to this information from most studies that are subject to the NIH GDS Policy. Under that proposal, interested users would have affirmed agreement with a statement regarding responsible use of GSR via a click-through mechanism. Institutions submitting genomic data to NIH-designated data repositories under the NIH GDS Policy would be expected to notify NIH of any studies for which there are particular sensitivities, such as studies including potentially stigmatizing traits, or with identifiable or isolated study populations. These studies would then be designated as sensitive , and access to GSR from such datasets would remain under controlled-access.

In order to gain stakeholder feedback about the specific proposed data management update, NIH released a 30-day Request for Comment (RFC) on September 20, 2017, and, in response to stakeholder requests, reopened the RFC for a second 15-day RFC on November 27, 2017. Because the workshops and previous RFI largely captured the perspective of researchers, a particular attempt was made to reach out to research participants and the broader patient community to request their feedback.

Through the RFC, NIH sought stakeholder input on: risks and benefits to broad sharing of GSR from most genomic studies, including use of the click-through agreement affirming responsible use; risks and benefits to maintaining GSR from sensitive studies in controlled-access; the proposed method for designating studies as sensitive ; and, general feedback on other topics within the proposed update.

Public Comments Received and NIH Response

NIH received comments from 109 national and international stakeholders. Overall, respondents were in favor of broader access to GSR. Of the comments received, the respondents primarily self-identified as scientific researchers (79%), members of the public (7%), and institutional officials (3%). The respondents primarily identified their organizations as universities (46%), nonprofit research organizations (22%), and biotechnology/pharmaceutical companies (11%).

Broad Access to Genomic Summary Results

The goal of the proposed data management update was to provide access to GSR through a data access model that is proportional to the risks and benefits posed by broad access to this type of information, and takes into account any study-specific elements that might increase privacy risks or potential for harm within a study population. Overall, respondents were in widespread agreement that the benefits of expanded access to most GSR outweigh the potential risks. In particular, respondents highlighted the significant scientific value of GSR and the fact that there is minimal or no risk to most participants if GSR were moved from controlled-access to an unrestricted access model. A minority of respondents cited potential risks related to participant protection including re-identification of individuals within genomic datasets and potential for inconsistency with informed consent understanding.

Based on these and previous public comments and additional internal deliberations, NIH is updating the access model to increase accessibility for GSR from most NIH-supported studies in a manner that promotes public benefit from the federal investment in genomics research while considering potential risk to research participants. This change is being made in recognition of the distinct risks posed by this type of information relative to individual-level genomic data, and to facilitate broad use of GSR for research or health purposes. NIH also anticipates that the updated model for GSR access will reduce requests for individual-level data, because GSR alone may be sufficient to address certain research questions, thereby reducing overall privacy risks related to genomic data sharing.

Designation of "Sensitive" Studies and Access to "Sensitive" Studies Genomic Summary Results

Respondents expressed widespread support for the proposed update’s retention of GSR under controlled-access for studies designated as sensitive , though there was some disagreement about the appropriate authority to make such designations (e.g., institutions, NIH, see below). Respondents recognized that privacy risks related to broad access to GSR may be heightened for some study populations, such as those from isolated geographic regions, those with rare traits, or those with potentially stigmatizing traits, and supported more stringent protection mechanisms for GSR from such studies. However, a minority of respondents (all self-identified as scientific researchers) pointed out potential concerns about limiting access to GSR from "sensitive" studies. A key concern among these comments was that it could diminish the benefit achieved through the use of GSR, because GSR from only a subset of studies would be broadly available. Additionally, several of those commenting on this topic argued that retaining GSR from studies designated as "sensitive" under controlled-access may subvert the wishes of participants, many of whom might want broad sharing of their data. Some of these respondents therefore felt that the proposed approach was too conservative.

NIH has considered all of the comments received, and this GDS Policy access update will retain the ability to designate certain studies as sensitive for the purposes of GSR access. GSR from studies so designated will be accessible only through controlled-access and remain subject to any data use limitations attached to the corresponding individual-level data.

Of the respondents who disagreed about who had the appropriate authority to make the sensitive designation, half commented in favor of institutional designation, and half opposed it or suggested additional considerations. Those who opposed institutional designation advocated for input or additional oversight from a separate, independent body, or the NIH, either to ensure adequate protection, or to ensure that the sensitive classification was used appropriately and consistently.

In this update to GSR access procedures, NIH has retained the original proposal to have the submitting institutions for every incoming and already submitted study determine if a dataset should be designated as sensitive and the GSR made accessible only through submission of a standard data access request for the full study dataset. This process is consistent with other responsibilities of submitting institutions prior to NIH accepting any dataset for distribution through NIH-designated data repositories (e.g., the delineation of any Data Use Limitations for future research use).

Rapid-Access Tier for Genomic Summary Results

The greatest number of comments received pertained to the introduction of the proposed rapid-access tier, with the majority of respondents expressing explicit opposition to utilizing a click-through agreement to provide an attestation that the investigator would be a good steward of the data. These commenters felt that the click-through agreement was too conservative, would not meaningfully reduce potential risk to participant interests, and several asserted that it would have no legal power. Respondents also emphasized that such a click-through mechanism would make broad programmatic access to GSR difficult or impossible, and severely curtail reuse of the information. Of those who commented on this topic, the majority provided support for a open-access model for GSR. The burden of a click-through agreement was further highlighted by those who were concerned that other resources that provide unrestricted access to GSR may also feel inclined to institute their own click-through agreements in response to the precedent set by such an NIH access model. Respondents posited that such a shift in precedent could therefore also result in decreased usage of those resources. It was also noted that the programmatic interfaces that provide software-enabled searches and automated queries of GSR provide power, scalability, and automation across distributed resources, amplifying the potential benefits to be achieved through the use of GSR for research and health purposes. Respondents also argued that the click-through does not meaningfully reduce risk because bad actors are unlikely to be dissuaded by the click-through’s attestations, especially considering the lack of accountability they associated with this method.

In consideration of the substantial comments received on this topic, NIH reassessed the rapid-access tier proposal for GSR access. NIH has determined that the rapid-access model with the click-through agreement requirement does not add proportional protection against privacy risks relative to the potential benefits to be gained through access to this information for research and health purposes. Internal deliberations noted that automated workarounds for the click-through transaction could be developed if the rapid-access tier went forward, which would at least partially circumvent the intent of the mechanism. Therefore, NIH will update the GDS Policy access model to allow access to GSR through the unrestricted access tier. The NIH intention for GSR use is that it be responsibly applied to advance research or health purposes 3 . NIH also maintains that GSR users should not attempt to re-identify individuals. To highlight the importance of these principles, the concepts addressed in the three principles for responsible research use originally proposed for the click-through agreement (no attempt to re-identify, use only for research or health purposes, and review of the responsible genomic data use informational materials) will be included with any information about GSR access or in conjunction with GSR displays that may be provided through NIH-designated data repositories.

Additional Resources and Informational Materials to Support Implementation

Respondents also indicated a desire for clear guidance and educational materials to support implementation of the proposed update. Suggested topic areas highlighted included 1) criteria and considerations to guide submitting institutions when designating "sensitive" studies; (where GSR will remain under controlled-access), 2) suggested language and clarity around NIH expectations that informed consent processes articulate GSR access plans for each study; and 3) best practices for research and clinical applications of GSR.

NIH appreciates these suggestions and will consider them as further implementation guidance is developed. At this time, the Points to Consider for Institutions and Institutional Review Boards in the Submission and Secondary Use of Human Genomic Data under the NIH Genomic Data Sharing Policy has been updated to include discussion on considerations for designating studies as sensitive for the purposes of GSR access. The National Human Genome Research Institute has also added language to its Informed Consent Resource for Genomic Research discussing the inclusion of disclosures about use and access to GSR during the informed consent process.

Update

NIH is committed to maximizing public benefit from genomic information generated through NIH-supported research in a manner consistent with current scientific and ethical considerations. In support of this goal and the promotion of scientific advances while protecting research participants privacy interests, NIH is updating its data management procedures under the GDS Policy to allow unrestricted access to GSR from most NIH-supported studies for health or research purposes. NIH anticipates that this updated GSR access model will be appropriate for the majority of genomic datasets available through NIH-designated data repositories under the NIH GDS Policy (e.g., dbGaP), and that it will thereby reduce the need to request controlled-access to individual-level genomic data unless it is necessary to address specific research questions. In addition, this modification will establish an access model for GSR that is proportional to the distinct risks associated with access to this type of information relative to those associated with access to individual-level genomic data. Integral to this data management update is the expectation that all those who access NIH GSR will: 1) complete the review of a responsible genomic data use informational module prior to accessing the information; 2) not use GSR to re-identify individuals or generate information that could allow participant’s identities to be readily ascertained; and 3) use GSR to promote scientific research or health.

For the purposes of the NIH GDS Policy, GSR are defined to include those provided by a study’s investigator, if any, as well as summary statistics that may be computed by relevant NIH-designated data repository across all non-"sensitive" studies with data included in that repository . GSR include systematically computed statistics such as, but not limited to: 1) frequency information (e.g., genotype counts and frequencies, or allele counts and frequencies); and 2) association information (e.g., effect size estimates and standard errors, and p-values). These values may be defined and calculated using scientifically relevant subsets of research participants included within study populations (e.g., disease, trait-based, or control populations). Information on methods for computing any summary statistics provided in unrestricted access by an NIH-designated data repository will be available through the repository’s website.

NIH acknowledges that it is possible that privacy risks related to broad access to GSR may be heightened for study populations from isolated geographic regions or with rare traits. It is also possible that certain study populations may be more vulnerable to group harm due to potential for stigma related to traits being studied or other participant protection concerns. In addition, for studies that include data on potentially stigmatizing traits, the outcomes of any privacy breach could conceivably cause greater harm to research participants than is likely under most circumstances. Therefore, institutions submitting datasets to NIH-designated data repositories should indicate in the genomic data sharing plan and the Institutional Certification if GSR from incoming studies should be provided only through controlled-access data access request and review procedures. In such cases, GSR will be accessible in conjunction with access to individual-level data and any data use limitations attached to use of the individual-level data will apply.

Informational Resources

To support awareness of the ethical responsibilities associated with responsible use of genomic information (including GSR) and to promote the NIH intent that use of genomic information be for research or health purposes, NIH will develop informational resources to be made publicly available through relevant NIH-designated data repositories.

Informed Consent

Consistent with the NIH GDS Policy, NIH expects that consent forms and the informed consent process for human genomic studies will clearly state the access plans for data and other information generated through the study, including GSR.

NIH expects consent processes and other information available to potential research participants to be transparent that participation in an NIH-supported study infers an acknowledgement that investigators may aggregate and analyze the data generated through the study. NIH expects that consent processes and other information explain that such analyses or other summaries of study information (including GSR) may be shared in the scientific literature and/or through other public scientific resources, such as data sharing resources that provide broad or unrestricted access to the information. A discussion of informed consent considerations for genomics research, including access to GSR, is included within the Informed Consent Resource hosted by the National Human Genome Research Institute, as well as in the NIH Guidance on Consent for Future Research Use and Broad Sharing of Human Genomic and Phenotypic Data Subject to the NIH Genomic Data Sharing Policy.

Effective Date

This update to the NIH GDS Policy data management procedures for GSR access will be effective immediately upon publication in the NIH Guide to Grants and Contracts.

Implementation of the GDS Genomic Summary Results Access Update

On the effective date of this access management update, investigators proposing to conduct GDS Policy applicable research will be expected to indicate in their genomic data sharing plan which is submitted with funding requests, contract proposals, or through Institute and Center procedures for intramural researchers, if a study should be designated as "sensitive" for the purposes of access to GSR. This "sensitive" determination should be confirmed in the Institutional Certification that is provided to the NIH during the Just-in-Time period, before finalization of a contract, or before the start of intramural research. If the genomic data sharing plan has already been submitted, and an Institutional Certification has not yet been provided to the appropriate Genomic Program Administrator (GPA), the updated Institutional Certification template should be used. The updated Institutional Certification templates are now available for any study to use.

For studies for which an Institutional Certification form has been submitted to NIH before the effective date of this update, November 1, 2018, submitting institutions will have six months to indicate if GSR from any of these studies should be maintained in controlled-access due to concerns about the sensitivity of study information. This applies to studies for which data have not yet been submitted to an NIH-designated repository and studies for which data are already accessible through an NIH-designated repository. Any "sensitive" designations for such studies should be made by submitting an updated Institutional Certification to the GPA for the funding Institute or Center, with copy to the NIH GDS mailbox ( gds@mail.nih.gov ) and the study’s Program Officer, by May 1, 2019. It is possible to request additional time to complete the assessment for a particular study by sending an email to the same NIH contacts. In such cases, the GSR for that study will remain in controlled-access until a final determination is received by the appropriate NIH Institute or Center

If a submitting institution does not contact NIH by May 1, 2019, to indicate that the GSR from a specific study should be designated as "sensitive", GSR from those studies may be provided through unrestricted access.

If a submitting institution that has already submitted an Institutional Certification to NIH wishes to confirm to NIH the appropriateness of unrestricted access to GSR from a particular study prior to the end of the six-month period, this can be indicated through the submission of a new Institutional Certification using the updated Institutional Certification template. This early action option for submitting institutions will enable GSR to be made accessible through unrestricted access immediately. Please see https://osp.od.nih.gov/scientific-sharing/genomic-data-sharing/ for more information on the update.

1. For the purposes of this data management update, genomic summary results include those provided by a study’s investigator (either as unpublished preliminary analyses or final published results), if any, as well as summary statistics computed by the relevant NIH-designated data repository.

2. An NIH-designated data repository is any data repository maintained or supported by NIH either directly or through collaboration.

3. Beyond the stated intention for GSR to be used only to advance research or health, there will be no formal data use limitations associated with access to or use of this information when available through unrestricted access.

Inquiries

Please direct all inquiries to:

NIH Office of Science Policy
Office of the Director
Telephone: 301-496-9838
Email: SciencePolicy@od.nih.gov