Notice Number: NOT-HG-15-038
Key Dates
Release Date: October 9, 2015
Issued by
National Human Genome Research Institute (NHGRI)
Purpose
NIH published the Genomic Data Sharing (GDS) Policy on August 27, 2014 (https://gds.nih.gov/). The purpose of this Notice is to inform the scientific community of how NHGRI is implementing this policy, which will cover all applications funded by NHGRI.
The current NHGRI implementation of the NIH GDS Policy is as follows:
Overarching Principles and Applicability
Broad data sharing promotes maximum public benefit from federally funded genomics research. NHGRI supports the broadest appropriate genomic data sharing with timely data release through widely accessible data repositories. These repositories may be open access (unrestricted) or, if more appropriate, controlled access (see the NIH list of data repository examples for guidance).
Whenever possible, NHGRI studies involving human data should use data generated from sources with participant consent for unrestricted access or for general research uses through controlled access. Similarly, consent language should avoid restrictions on the types of users who may access the data. NHGRI acknowledges that this will not always be possible or appropriate. In addition, individual participants who do not consent to future use or broad data sharing may still participate in the primary study, if consistent with study design.
NHGRI encourages sharing of all data types. However, at this time the NIH GDS Policy and NHGRI implementation plans apply particularly to single nucleotide polymorphism (SNP) array data, genome sequence data, transcriptomic data, epigenomic data, or other molecular data produced by array-based technologies or high-throughput sequencing technologies.
Data pertinent to the interpretation of genomic data-such as associated phenotype data (e.g., clinical information relevant to the disease under study), exposure data, and descriptive information (e.g., protocols or methodologies used)-are expected to be shared. All data sets should include the appropriate metadata to allow efficient sharing and integration with other data sets.
Examples of research or research-related activities funded or supported by NHGRI that are outside the scope of the NIH GDS Policy include, but are not limited to, projects that do not meet the criteria specified in the NIH GDS Policy Supplemental Information.
Informed Consent
Per the NIH GDS Policy informed consent documents for prospective data collection after January 25, 2015 should state what data types will be shared (e.g., genomic, phenotype, health information, etc.), for what purposes (e.g., general research use, disease-specific research use, etc.), and whether sharing will occur through open (unrestricted) or controlled access databases (or an approved alternative sharing plan). This and other information that NIH expects to be conveyed in documents obtaining explicit consent for future research use and broad data sharing are defined in the NIH Guidance on Consent for Future Research Use and Broad Sharing of Human Genomic and Phenotypic Data Subject to the NIH Genomic Data Sharing Policy. These expectations also apply to data to be produced from cell lines or clinically derived samples.
Likewise, for research involving samples collected prior to January 25, 2015, NHGRI recognizes that informed consent processes may not have explicitly anticipated future broad data sharing or research use. In these instances, submitting institutions should assure that the future research use and data sharing plans are not inconsistent with the informed consent provided by study participants. Relevant issues to consider in these situations are reviewed in the NIH Points to Consider for Institutions and IRBs regarding genomic data sharing.
If established or commercially available cell lines or clinical specimens created prior to January 25, 2015, are included as data sources in a study, investigators should seek whenever possible to use samples where consent for future research use and data sharing can be documented.
NIH and NHGRI acknowledge that broad data sharing may not always be appropriate. In these instances, investigators should request an exception from data deposition in an NIH-designated data repository prior to initiating research activities, if appropriate samples with broader data sharing consent are not available. Exceptions from data deposition should be justified through a data sharing plan submitted with the funding request (see "Data Sharing Plans" below).
Similarly, there may be cases involving cell lines or specimens collected after January 25, 2015, where requesting explicit consent for future research use and broad data sharing was not possible but where there are compelling scientific reasons to conduct the research with those data sources. In those cases, and consistent with any NIH guidance issued, an exception from obtaining explicit consent may be requested from NHGRI in a data sharing plan.
Investigators should note the following NHGRI expectation, which goes beyond the basic NIH expectation with regard to grandfathered data sources. NHGRI expects that by January 25, 2020, all human data used by NHGRI-funded or -supported research will be generated from specimens or cell lines for which explicit consent for future research use and broad data sharing can be documented. Research proposing to use samples lacking such consent should be accompanied by an alternative data sharing plan supported by a compelling scientific reason for using the specified data sources. Exceptions to this expectation will continue to be granted when there is a compelling scientific reason, as provided for in the NIH GDS Policy.
For more in-depth discussion of principles and best practices for drafting informed consent documents for genomics research, see the NHGRI Informed Consent Resource.
Data Sharing Plans
Resources regarding the expected elements of data sharing plans are provided on the NIH GDS Policy website, including information on how these plans are considered during peer review. After peer review, NHGRI will assess the potential value of the dataset for use in secondary analyses to confirm findings, explore different research questions, develop or refine analytic methodologies or programs, etc. In addition, funds and other resources needed for data deposition, management, or access will be considered.
For studies involving human data, NHGRI also will consider Institutional Review Board (IRB) assessments of informed consent processes and consent documents as noted in the NIH GDS Policy. Other participant protection issues for the proposed study population (e.g., particular privacy concerns or a potential for group harm) or related to the scientific design (e.g., isolated geographic population or small family studies), as evaluated by an IRB and consistent with program priorities, will also inform data sharing plan review.
As specified in the NIH GDS Policy, data submitting institutions (including the NHGRI Intramural Research Program) should submit to the appropriate NHGRI Program Director or NHGRI Genomic Program Administrator (GPA), as appropriate, an Institutional Certification document signed by an appropriate Institutional Signing Official, for studies that require this document.
Exceptions to Data Deposition and Alternative Data Sharing Plans
Per the NIH GDS Policy, NHGRI will consider requests for exceptions to standard data sharing plan expectations. When consistent with program priorities, NHGRI may accept well-justified data sharing plans that do not include broad data sharing or that include more narrow data use limitations for future research.
Basic criteria that NHGRI will use to assess exception requests include an IRB or equivalent determination that informed consent materials preclude broad data sharing, or an IRB assessment that there are additional participant protection concerns related to the nature or character of the study population (e.g., geographical location or small study designs focused on a rare disease).
Investigators may also submit a justification within a data sharing plan demonstrating that data sharing costs (e.g., financial or personnel resources) outweigh the potential for broad scientific value of access to the data.
In all cases where alternative data sharing plans are determined to be appropriate, information on how to request access to the data and a basic summary of the study and study data will be listed in dbGaP (or other appropriate NIH-designated data repository). Timelines for data submission and access under alternative data sharing plans should be consistent with those for standard data sharing under the NIH GDS Policy.
Information about any additional elements to consider in requesting exceptions from data deposition in NIH-designated data repositories will be added to this page as they are developed.
Data Submission and Release
All final datasets (human or non-human, including microbial data) generated through large-scale genomic projects, not just those datasets generated to support a publication, should be submitted to appropriate data repositories or made available through NHGRI-approved alternative data sharing plans.
All metadata and descriptive information (e.g., protocols or methodologies used) needed to support future use of the data should be submitted. As much de-identified phenotype data as is practicable should be submitted. In this context, phenotype data refers to clinical data, environmental data, demographic variables, and any non-genotype data. When appropriate, relevant phenotype data from non-human studies also should be shared through open (unrestricted access) community resource data repositories.
Large resource projects (e.g., 1000 Genomes) should share their raw data (e.g., reads), intermediate data (e.g., assemblies), and processed data (e.g., variant calls, genotypes, haplotypes). When possible, investigators should use standard formats and vocabularies/ontologies to describe data elements (e.g., sequence data, variants, or phenotypes).
Clear milestones for the timing of data deposition should be established for each project to provide a timeline by which to assess progress toward meeting data submission expectations. Milestones should adhere to standard data release timelines outlined in the NIH GDS Policy Supplemental Information and the NHGRI Guidance for Data Submission and Data Release table below, and should be agreed upon prior to the start of research projects. Large resource projects may develop project-specific timelines for data release, in conjunction with program officers, that exceed the minimum expectations specified in the NIH GDS Policy Supplemental Information and the NHGRI Guidance for Data Submission and Data Release table (see table below).
Unless otherwise specified by project funding announcements, analyses by submitting investigators that are conducted subsequent to the initial data submission, final data sets, or any data updates should be submitted for release concurrent with the first publication analyzing the dataset.
Investigators should note the following NHGRI data release expectation for non-human genomic data that differs from the NIH expectation. Data sharing plans for NHGRI-funded or -supported projects to generate non-human genomic data proposed after January 25, 2016 should include pre-publication timelines for data submission and release consistent with NIH GDS Policy expectations for human genomic data (including a possible holding period before data release not to exceed six months).
Data sharing progress reports will be expected consistent with trans-NIH processes as they are implemented, or through other NHGRI consortia reporting mechanisms, as applicable. Program directors will monitor progress against the timelines established through the data sharing plans.
NHGRI Guidance for Data Submission and Data Release
Level |
General Description |
Example Data Types |
Data Submission Expectation |
Data Release |
0 |
Raw data generated directly from the instrument platform |
Instrument image data |
Human data: Not expected. |
Human data: NA. |
1 |
The basic data after the initial processing of raw input data |
DNA sequence reads, ChIP-Seq reads, RNA-Seq reads, SNP array data, array CGH data |
Human data: Not expected. |
Human data: NA. Non-human data: No later than the time of initial publication; an earlier release date may be designated for certain data types or NIH projects. |
2 |
Data after an initial round of processing or computation to clean the data and assess basic quality measures |
DNA sequence alignments to a reference sequence or de novo assembly, RNA expression profiling |
Human data: After data cleaning and quality control, which is generally within 3 months after data were generated. Project specific. Non-human data*: Data submission expected at the time of initial publication; an earlier submission date may be designated for certain data types or NIH projects. |
Human data: Up to 6 months after data submission is initiated or at the time of acceptance of initial publication, whichever occurs first. Non-human data: Data released at the time of initial publication; an earlier release date may be designated for certain data types or NIH projects. |
3 |
Analysis to identify genetic variants, gene expression patterns, or other features of the dataset |
SNP or structural variant calls, genotypes, expression levels, epigenomic features |
Human data: After cleaning and quality control, which is generally within 3 months after data have been generated. Project specific. Non-human data*: Data submission expected at the time of initial publication; an earlier release date may be designated for certain data types or NIH projects. |
Human data: Up to 6 months after data submission is initiated or at the time of acceptance of initial publication, whichever occurs first. Non-human data: Data released at the time of initial publication; an earlier release date may be designated for certain data types or NIH projects. |
4 |
Final analysis that relates the genomic data to phenotype or other biological states |
Genotype-phenotype relationships, relationships of RNA expression or epigenomic patterns to biological state |
Human data: Data submitted as analyses are completed. Non-human data*: Data submission expected at the time of initial publication. |
Human data: Data released with publication. Non-human data: Data released at the time of initial publication. |
* Investigators should note the following NHGRI data release expectation for non-human genomic data that differs from the NIH expectation. Data sharing plans for NHGRI-funded or -supported projects to generate non-human genomic data proposed after January 25, 2016, should include pre-publication timelines for data submission and release consistent with NIH GDS Policy expectations for human genomic data (including a possible holding period before data release not to exceed six months).
Governance and Contact Information
The NHGRI Genomic Data Sharing Governance Committee provides on-going stewardship and leadership for Institute data sharing policies and their implementation.
Contact: Laura Lyman Rodriguez, Ph.D.
E-mail: laura.rodriguez@nih.gov
The NHGRI Data Access Committee (DAC) provides oversight and monitoring of data access activities and participant protection needs related to all NHGRI-hosted data sets.
Contact: Vivian Ota Wang, Ph.D.
E-mail: Vivian.OtaWang@nih.gov
The NHGRI Genomic Program Administrator (GPA) functions as a central point of coordination and information about NHGRI implementation of NIH and NHGRI polices.
Contact: Vivian Ota Wang, Ph.D.
E-mail: Vivian.OtaWang@nih.gov
Additional Information
For additional information related to NHGRI's implementation of the NIH GDS Policy, see Additional Resources.
The description of NHGRI’s implementation of the NIH GDS Policy can be found at http://www.genome.gov/27562511. This information will be updated as needed.
Inquiries
Please direct all inquiries to:
Laura Lyman Rodriguez, Ph.D.
National Human Genome Research Institute (NHGRI)
Telephone: 301-594-7185
Email: rodrigla@mail.nih.gov