Request for Information (RFI): Proposed Policy for Sharing of Data obtained in NIH supported or conducted Genome-Wide Association Studies (GWAS)

Notice Number: NOT-OD-06-094

Key Dates
Release Date: August 30, 2006
Response Date: October 31, 2006
Revised Response Date: November 30, 2006 (Per NOT-OD-07-012)

Update: The following update relating to this announcement has been issued:

  • August 28, 2007 - See Notice (NOT-OD-07-088) Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS).

Issued by
National Institutes of Health (NIH), (

The NIH is seeking comments regarding a proposed policy for NIH supported or conducted Genome-Wide Association Studies (GWAS). A genome-wide association study is currently defined as any study of genetic variation across the entire human genome that is designed to identify genetic associations with observable traits (such as blood pressure or weight), or the presence or absence of a disease or condition. The proposed policy addresses (1) data sharing procedures, (2) data access principles, (3) intellectual property and (4) issues regarding the protection of research participants through all phases of GWAS. Many of the principles contained in the policy reflect and extend existing NIH polices (e.g., the 2003 data sharing policy 1) and other recent NIH discussions.2

The goal of the proposed policy is to advance science for the benefit of the public through the creation of a centralized NIH GWAS data repository. Maximizing the availability of resources facilitates research and enables medical science to better address the health needs of people based on their individual genetic information. The NIH is seeking public input and advice on the overall concept of the proposed policy and specific feedback on the following questions:

1. What are the potential benefits and risks associated with wide sharing of phenotypic and genotypic data where identifying information has been removed?

2. In addition to removing personal identifying information, what protections are needed to minimize risks to research participants whose phenotypic and genotypic data are included in a centralized NIH data repository and shared with qualified investigators for research purposes?

3. What are the advantages and disadvantages of the proposed:

  1. centralized NIH data repository?
  2. approach to data submission?
  3. approach to scientific publication?
  4. approach to intellectual property?

4. What specific resources may investigators and institutions need to meet the goals of this proposed policy?

The NIH encourages comments concerning its proposed policy to enhance access to GWAS data as outlined in this notice. Persons, groups, and organizations interested in commenting on NIH’s proposed policy should direct their comments to the following NIH Web site: As an alternative, comments may be submitted by e-mail to or sent by mail to the following address:

NIH GWAS RFI Comments, National Institutes of Health, Office of Extramural Research, 6705 Rockledge Drive, Room 350, Bethesda, MD 20892 7963.


The NIH is interested in advancing GWAS to identify common genetic factors that influence health and disease. Whole genome information, when combined with clinical and other phenotypic data, offers the potential for increased understanding of basic biological processes affecting human health, improvement in the prediction of disease and patient care, and ultimately the realization of the promise of personalized medicine. In addition, rapid advances in understanding the patterns of human genetic variation and maturing high-throughput, cost-effective methods for genotyping are providing powerful research tools for identifying genetic variants that contribute to health and disease. For these reasons, the NIH announced this spring that it has planned to: (1) update the NIH data sharing policy for research applications involving GWAS data; (2) initiate a public consultation process to inform policy development activities; and (3) track GWAS applications and awards at a central level (see This RFI serves as the first step in the public consultation process referenced in the May 15, 2006 Notice.

Protecting Research Participants. The potential for public benefit to be achieved through sharing GWAS data is significant. However, genotypic and phenotypic information generated about individuals, such as data related to the presence or risk of developing particular diseases or conditions, and information regarding paternity or ancestry, may be sensitive and substantial. Therefore, it is critically important that the privacy and confidentiality of the participants be protected. Risks to individuals, groups, or communities should be carefully balanced with potential benefits of the knowledge to be gained through GWAS. The nature of GWAS information about participants and the broad data distribution goals of the NIH GWAS data repository highlight the importance of the informed consent process to this research. In order to protect research participants, the NIH will establish mechanisms to oversee the repository and monitor GWAS data use practices.

The NIH recognizes that there are evolving scientific, ethical and societal issues relevant to this proposed policy and will revisit and revise the policy as appropriate.

Proposed Policy for Genome-Wide Association Studies (GWAS)


Consistent with both the NIH mission to improve public health through research and its longstanding legislative mandate to make available to the public the results of the research activities that it supports and conducts, the NIH believes that the full value of GWAS to the public can be realized only if the genotype and phenotype datasets are made available as rapidly as possible to a wide range of scientific investigators. Rapid and broad data access is particularly important for GWAS because of the significant resources involved; the challenges of analyzing large datasets; and the extraordinary opportunities for making comparisons across multiple studies.

Protection of research participants is a fundamental principle underlying biomedical research. The NIH is committed to responsible stewardship of data throughout the research process, which is essential to protecting the interests of study participants and to maintaining public trust in biomedical research.


This draft policy is proposed to apply to active research applications identified by applicants or NIH staff as GWAS per NOT-OD-06-071.

Data Management

Data Repository. To facilitate broad and consistent access to NIH-supported GWAS datasets, the NIH proposes the development of a central GWAS data repository, at the NIH (National Center for Biotechnology Information [NCBI], National Library of Medicine). The repository will provide a single-point of access to basic information about NIH-supported GWAS and to available genotype-phenotype datasets for GWAS. Although the NIH envisions that access to all NIH-supported GWAS datasets will be possible through this repository, it does not intend this repository to become the exclusive source of these data. The repository will also accept GWAS datasets contributed from other sources.

Data Submission. All investigators who receive NIH support to conduct genome-wide analysis of genetic variation in a study population are expected to submit to the GWAS data repository descriptive information about their studies for inclusion in an open access portion of the GWAS data repository. This information should include the following:

  • the protocol,
  • questionnaires,
  • study manuals,
  • variables measured, and
  • other supporting documentation.

In addition, the NIH strongly encourages the submission of curated and coded phenotype, exposure, genotype, and pedigree data, as appropriate, to the GWAS data repository as soon as quality control procedures have been completed at the local institution. These detailed data will be made available through a controlled access process according to the GWAS Data Access procedures (described below). Investigators who elect to submit their GWAS data to additional data repositories or networks should verify that appropriate data security, confidentiality, and privacy measures are in place for the protection of GWAS participants.

In order to minimize the risks to study participants, data will be submitted to the GWAS data repository without identifiable information and using a random, unique code. Keys to codes will be held by submitting institutions. Submissions of GWAS data should be accompanied by a written certification stating that the identities of research participants will not be disclosed to the GWAS data repository or to secondary users of the coded data without appropriate institutional approvals. Therefore, research participants should not expect the return of individual research results derived from analyses of submitted data.

All submissions to the GWAS data repository should be accompanied by:

  • a certification by the responsible IRB that they have reviewed and approved the submission to the GWAS data repository, noting specifically that:
    • inclusion in the GWAS data repository and subsequent sharing for appropriate research purposes is consistent with the initial informed consent process of study participants from whom the data were obtained; and
    • identifying any uses of the data that are specifically excluded within the informed consent provided by study participants, which will be noted in the database; and
  • a statement from the institution from which data are contributed that submission of the data is in accord with all applicable laws and regulations.3

Data Access. The basic descriptive information submitted to the GWAS data repository for each NIH-conducted or supported GWAS will be available to the public through the GWAS data repository. Access to the genotype and phenotype datasets submitted and stored in the GWAS data repository along with pre-computed analyses (such as simple genotype-phenotype associations and a listing of all variants known to be in linkage disequilibrium4 with variants showing significant association with a phenotype or trait) will be provided for research purposes through an NIH Data Access Committee (DAC). NIH anticipates that individual DACs may be established based on programmatic areas of interest and the relevant needs for technical and ethics expertise. All DACs will operate through common principles and under similar mechanisms to ensure the consistency and transparency of the GWAS data access process.

Investigators seeking data from the GWAS data repository will be asked to submit a Data Use Certification that is co-signed by the designated Institutional Official, for approval by the appropriate NIH DAC. Data Use Certifications should include a brief description of the proposed research use of the requested GWAS dataset(s). Within a Data Use Certification investigators will stipulate that they will:

  • use the data only for the approved research use;
  • protect data confidentiality;
  • follow all applicable laws and any local institutional policies and procedures for handling GWAS data;
  • not attempt to identify individual participants from whom data within a dataset were obtained;
  • not sell or share any of the data elements from datasets obtained from the GWAS data repository with third parties; and
  • provide annual progress reports on research.

Access to GWAS datasets through the GWAS data repository will be approved by DACs following: (1) the completion of the Data Use Certification; and (2) confirmation that the proposed research use is consistent with any constraints identified by the institutions that submitted the dataset to the GWAS data repository.


The NIH expects that for a defined period of time following the release of a given genotype-phenotype dataset through the GWAS data repository (including the pre-computed analyses of the data), the investigators who contributed the data to the GWAS data repository should retain the exclusive right to publish analyses of the dataset. During this period of exclusivity, the NIH may grant access to other investigators, who may analyze the data, but are expected not to publish their analyses or conclusions during this period. This period of exclusivity is presently anticipated to be nine months from the date that the GWAS dataset is made available for access through the GWAS data repository, although a shorter period of exclusivity may be requested by the NIH funding Institute or Center. Contributing investigators are encouraged to shorten any such period of publication exclusivity at their own discretion. Following the expiration of the exclusive publication period for a given GWAS dataset, NIH expects that any investigator with access to the data may submit publications for any purpose consistent with the practices and policies of their institution and the NIH.
The NIH also expects that all investigators who access GWAS datasets will acknowledge the Contributing Investigator(s) who conducted the original study, and the funding organization(s) that supported the work in all resulting oral or written presentations, disclosures, or publications of the analyses.

Intellectual Property

It is the hope of the NIH that genotype-phenotype associations identified through NIH-supported and maintained GWAS datasets and their obvious implications will remain available to all investigators, unencumbered by intellectual property claims. The NIH discourages premature claims on pre-competitive information that may impede research, though it encourages patenting of technology suitable for subsequent private investment that may lead to the development of products that address public needs.

The NIH will provide approved GWAS data users with information regarding any significant associations within GWAS genotype-phenotype data and other pre-computed analyses (described under the Data Access section on page 4) as a component of the GWAS datasets distributed through the GWAS data repository.

The NIH expects that NIH-supported genotype-phenotype data made available through the GWAS data repository and all conclusions derived directly from them will remain freely available, without any licensing requirements, for uses such as, but not necessarily limited to, markers for developing assays and guides for identifying new potential targets for drugs, therapeutics, and diagnostics. The intent is to discourage the use of patents that would prevent the use of or block access to any genotype-phenotype data developed with NIH support. The NIH encourages broad use of NIH-supported genotype-phenotype data that is consistent with a responsible approach to management of intellectual property derived from downstream discoveries as outlined in the NIH’s Best Practices for the Licensing of Genomic Inventions and its Research Tools Policy.

The filing of patent applications and/or the enforcement of resultant patents in a manner that might restrict use of NIH-supported genotype-phenotype data could substantially diminish the utilization of information and the potential public benefit they could provide. Approved Users and their institutions, through the execution of an NIH Data Use Certification, will acknowledge the goal of ensuring the greatest possible public benefit from NIH-supported GWAS.

Expectations for Investigators Under the Proposed Policy

Although the detailed expectations are enumerated in the individual sections of this proposed policy, they are summarized as follows:

Investigators submitting GWAS data will be expected to:

  • provide descriptive information about their studies;
  • submit coded genotypic and phenotypic data to the GWAS data repository;
  • submit certification by the responsible IRB that it has reviewed and approved submission to the NIH, noting any limitations on data use based on the relevant informed consents; and
  • submit an assurance from the responsible institution that all data are submitted to the NIH in accord with applicable law.

Investigators requesting GWAS data will be expected to:

  • submit a description of the proposed research project;
  • submit a Data Use Certification co-signed by their sponsoring institution;
  • protect data confidentiality; and,
  • submit annual progress reports detailing significant research findings.


Inquiries will be accepted at: or Comments can be mailed to NIH GWAS RFI Comments, National Institutes of Health, Office of Extramural Research, 6705 Rockledge Drive, Room 350, Bethesda, MD 20892 7963.


1 The 2003 NIH Data Sharing Policy applies to investigators seeking $500,000 or more in direct costs in any year (

2 Request for Information on Modifications to the NHLBI Policy for Distribution of Data from Clinical Trials and Epidemiology Studies (, 2006

3 Applicable federal regulations may include HHS human subjects regulations (45 CFR Part 46), FDA human subjects regulations (21 CFR Parts 50 and 56), and the Health Insurance Portability and Accountability Act Privacy Rule (45 CFR Part 160 and Part 164, Subparts A and E).

4 Linkage disequilibrium information will be based on data from the International HapMap Project (