Notice on Development of Data Sharing Policy for Sequence and Related Genomic Data

Notice Number: NOT-HG-10-006

Update: The following update relating to this announcement has been issued:

Key Dates
Release Date: October 19, 2009

Issued by
National Institutes of Health (NIH), (


The purpose of this Notice is to inform the research community of plans by the National Institutes of Health (NIH) to:

1. Update data sharing policies for NIH supported research, including extramural and intramural projects, involving sequence and related genomic data obtained with advanced sequencing technology (e.g., medical resequencing data, sequence data from non-human species, including microorganisms, transcriptomic and epigenomic data, as well as data needed for interpretation, including associated clinical, other phenotype and metadata, such as supporting study documents and methodologies);
2. Encourage investigators and IRBs to consider the potential for broad sharing of sequence and related genomic data in developing informed consent processes and documents for such studies involving human sequence data; and,
3. Communicate the agency’s intent and current underlying considerations related to developing a policy pertaining to the deposition of these large datasets into centralized databases, such as the GenBank Short Read Archive (SRA) or the Database of Genotypes and Phenotypes (dbGaP), so that they are available as broadly and rapidly as possible to a wide range of scientific investigators.

Need for Broad Data Sharing Policies

Rapid advances in applying genomic approaches to developing an understanding of the patterns of genetic and epigenetic variation, gene expression, and chromosomal organization are being made possible by maturing, more-effective methods and technologies for generating very large sequence data sets. These data sets are not only valuable for addressing the questions that the experiments were designed to ask, but also have added scientific value when combined with other large data sets. Consistent with the NIH mission to improve public health through research and the longstanding NIH policy to make data publicly available from the research activities that it funds, the NIH has concluded that the full value of sequence-based genomic data can best be realized by making the sequence, as well as other genomic and phenotype datasets derived from large-scale studies, available as broadly as possible to a wide range of scientific investigators. Therefore, the NIH intends to initiate a policy development process for data sharing of sequence and related genomic data. In undertaking this process, the NIH acknowledges the importance of recognizing the valuable and unique contributions made by the scientists who have collected the biological samples and associated phenotype information, and generated the initial sequence data.

As many of the sequencing studies conducted in the next several years may include human clinical/phenotype information, safeguarding the interests of research participants will be an essential component of any sequence data sharing policy. The NIH recognizes that data sharing practices for data sets derived from studies with human research participants must be consistent with the informed consent provided by the individual participants and, therefore, anticipates that investigators will consider or develop appropriate informed consent approaches that permit such data sharing going forward1.

Current policy discussions will focus on how best to facilitate the maximum utility of sequence and related data by providing broad access to these data for research and computational analysis, which will ultimately lead to the development of new prognostic, diagnostic, preventative and therapeutic approaches to human disease. The NIH considers broad data access to be particularly important for sequence and related genomic data because of the significant resources involved in generating such data (which necessarily limits the number of projects that can be supported for any disease), the analytical and computational challenges involved in interpreting such large datasets, and the powerful opportunities that will be provided by the ability to make comparisons across multiple studies.

Building upon Prior Data Sharing Policies for Large-scale Genome Projects

The issues to be considered in developing the new sequence data sharing policy overlaps considerably with issues considered in previous NIH policy development activities2, in particular, the NIH genome-wide association studies (GWAS) data sharing policy. Therefore, in the development of sequence data sharing policies, the NIH expects to build upon those policies as well as the extensive public consultation efforts that were undertaken through interactions with scientific and public stakeholders in the development of the GWAS policy (see

In addition, any updated NIH policies will be broadly consistent with previously enunciated NIH policies on the sharing of research tools ( We also note that any such policies should be broadly consistent with guidance developed through several international meetings on the release of genome sequence and other large data sets (see

Policy Considerations Under Development

NIH is currently considering several issues relating to the deposition and release of sequence and related data through central database resources, including:

1. The characteristics and rationales for determining whether a project might be subject to any new sequence and other genomic data-release policy, such as:

  • The broad utility of the dataset (either alone or in combination with other available data) to the wider community, e.g. for performing new analyses, critical validations, and additional studies not considered by the original data producer;
  • The uniqueness of the dataset, as indicated in part by the size and cost to NIH of the dataset to be produced, including the extent of the genomic region(s) to be analyzed, the number of samples to be analyzed, and the target amount of data to be produced;
  • The quality and extent of the available associated non-sequence data such as phenotype and exposure data in the study population(s), if relevant to the study design; and,
  • The consideration of issues related to the protection of participant interests, e.g., privacy, confidentiality and the consistency of the informed consent process with broad data sharing, in projects involving human data.

2. The specific types of primary and processed data that should be released; the types of accompanying metadata and other annotation (e.g., phenotype, epigenetic mark) that may be critical for data interpretation; and the technical capabilities and requirements for broad sequence data sharing, including standard data formats.
3. Timing of broad data release, including the potential for pre-publication data release.
4. Mechanisms and policies for making data available to third parties, and terms of access including the possibility of a period of exclusivity for publication by the Principal Investigator and collaborators, and consideration of any limitations on future data use within the informed consent agreement.
5. Costs of implementing policies to investigators, institutions, and NIH.

This Notice serves as a reminder for investigators to consider the issues listed above carefully in designing any study seeking NIH support that includes the use of large-scale advanced sequencing technologies, and within any applications submitted to the NIH, as applicable. Some NIH Institutes/Centers may begin considering these issues prior to release of a final policy; applicants are encouraged to contact their program official for specific issues related to an application or project. The NIH anticipates that this policy development process will occur over the next several months. At an appropriate time before the policy is implemented, the NIH will publish additional details on the policy plans. To share comments or for further information regarding policy development, please contact the representative listed below.

1For information and additional resources on informed consent in genomics research, see

2For example,,,, and


Laura Lyman Rodriguez, Ph.D.
National Human Genome Research Institute
National Institutes of Health
31 Center Drive, Room 4B09
Bethesda, Maryland 20892
Phone: 301-496-0844