Request for Information on Processes for dbGaP Data Submission, Access, and Management

Notice Number: NOT-OD-17-044

Key Dates
Release Date:   February 21, 2017

Related Announcements

Issued by
National Institutes of Health (NIH)


This Request for Information (RFI) seeks public comments on the data submission and access processes for the NIH National Center for Biotechnology Information (NCBI) database of Genotypes and Phenotypes (dbGaP), and on the management of data in dbGaP, in order to consider options to improve and streamline these processes and to maximize the utility of dbGaP.

Response to this RFI is voluntary. Responders are free to address any or all of the topics listed in the request or any other relevant topics respondents recognize as important for NIH to consider for dbGaP data submission, access, and management. Respondents should not feel compelled to address all items. Instructions on how to respond to this RFI are provided at


NIH Policies for the sharing of genomic and associated phenotypic data, the 2014 NIH Genomic Data Sharing (GDS) Policy [1] and its predecessor, the 2007 NIH Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS Policy) [2] , set forth expectations and responsibilities to ensure the broad and responsible sharing of genomic research data in a timely manner. Fundamental to NIH’s stewardship of these data is respect for and protection of research participants’ interests. In 2007, NIH developed dbGaP [3] to archive and distribute the results of human genotype-phenotype studies that fall under these policies, in a manner that is consistent with the consent of the study participants whose genomic data are to be shared. dbGaP, a controlled-access data repository, currently serves as a central portal to submit, locate and request access to human genomic (e.g., GWAS, sequencing, expression, epigenomics data) and associated phenotypic and exposure datasets. NIH has established a governance system to facilitate the development and oversight of consistent, transparent, and efficient processes for using dbGaP and related genomic data sharing activities under the NIH GDS Policy [4].

As of January 2017, dbGaP maintains 4,625 datasets from 786 studies, representing over 1.2 million unique research participants. To date, over 44,000 Data Access Requests (DARs) submitted by 4,898 investigators from 46 countries have been processed. Even though dbGaP is a rapidly growing and highly utilized resource and many improvements to the dbGaP data submission and access processes have been made [5], NIH believes that the processes for requesting and submitting data could be streamlined and improved. Through this RFI, NIH seeks public feedback on the dbGaP data submission and access processes, and data management practices, to inform NIH about how to make dbGaP systems more user-friendly and efficient as they continue to grow and evolve.

Information Requested

The NIH invites feedback pertaining to any opportunities or challenges related to the following topics, as well as potential areas and opportunities to improve understanding, efficiency, or transparency of the processes associated with these topics:

1. dbGaP Study Registration and Data Submission [6]. Examples of areas of possible comments include, but are not limited to:

  • dbGaP study registration process for NIH-funded studies or non-NIH-funded studies
  • dbGaP data submission process
  • Technical aspects of study registration, data submission, and data release (e.g., obtaining study accession numbers, data formatting and standards, data transmission)

2. dbGaP Data Access Request (DAR) and Review [6]. Examples of areas of possible comments include, but are not limited to:

  • DAR process, including, for example:
    • Obtaining the credentialing necessary to request access (e.g. through eRA Commons)
    • Identifying and selecting studies and datasets of research interest on the dbGaP website
  • Data Access Committee review process
  • Downloading data from dbGaP
  • Project renewal and close-out processes (e.g., length of data access period, information requested in progress updates or close-out reports)

3. Policies for the Management and Use of dbGaP Data. Over time, NIH has received feedback about existing practices for managing access to data subject to the GDS Policy. To further inform NIH policies and practices for the management of dbGaP data, NIH is interested in public feedback on topics such as the following:

  • Alternate controlled-access models that increase efficiency of the management, oversight, and use of data at a single institution or multiple, collaborative institutions
  • Benefits and risks associated with the availability of genomic study summary statistics [7]. This type of information is currently managed through controlled-access in
  • dbGaP [8] and was the topic of the recent National Human Genome Research Institute (NHGRI) Workshop on Sharing Aggregate Genomic Data [9]. Examples of areas of possible comments include, but are not limited to:
    • Risks and benefits of different management models for genomic summary statistics related to participant privacy and/or scientific opportunity for its broad use
    • Alternative options for providing access to genomic summary statistics beyond unrestricted or controlled-access models (e.g., registered access)
    • Factors to consider in determining the risk-benefit balance in the management of and access to genomic summary statistics for specific datasets (e.g., those including sensitive information or vulnerable populations)
    • Methods for mitigating risks associated with unrestricted access to genomic summary statistics
  • Clinical Use of Genomic Research Data Maintained in Controlled-Access in dbGaP. The medical genetics community is increasingly interested in obtaining access to dbGaP resources (e.g., NCBI dbGaP Data Browser [10]) for clinical reference uses, such as obtaining additional information to help interpret the significance of a genomic variant. This type of use might be analogous to reference use by health care providers (e.g., physicians, genetic counselors, clinical laboratorians) to NIH-supported resources such as Online Mendelian Inheritance in Man (OMIM) [11] , ClinVar [12], or the Genetic Testing Registry (GTR) [13]. NIH is interested in public comments on benefits and risks of such reference use of dbGaP data for research participants, patients, and the scientific community.

4. General Comments. NIH welcomes general comments on any other topics with regard to dbGaP data submission, access, and management.

Submitting a Response

Comments on the topic areas of interest should be submitted electronically to the following webpage: https: or mailed to: Office of Science Policy (OSP), National Institutes of Health, 6705 Rockledge Drive, Suite 750, Bethesda, MD 20892, or by fax to: 301-496-9839 by April 7, 2017.

This RFI is for planning purposes only and should not be construed as a policy, solicitation for applications, or as an obligation on the part of the Government to provide support for any ideas identified in response to it. Please note that the United States Government will not pay for the preparation of any information submitted or for its use of that information.

Comments received, including any personal information, will be posted without change after the close of the comment period to the NIH GDS website [14]. Please do not include any proprietary, classified, confidential, or sensitive information in your response. We look forward to your input and hope that you will share this RFI document with your colleagues. Updates to this document, if any, will be noted.

The Government reserves the right to use any non-proprietary technical information in summaries of the state of the science, and any resultant solicitation(s). The NIH may use information gathered by this RFI to inform development or modification of data sharing databases, websites, policies and practices, processes and procedures, and supporting documentation (e.g., guidance, FAQs).


[3] dbGaP was developed by the National Center for Biotechnology Information, National Library of Medicine, NIH.
[5] Insert link to publication when available:  OSP Poliscope blog post describing previous improvements to dbGaP
[7] For the purposes of this document, genomic summary statistics are defined as: calculated summary statistics, including genotype counts, allele frequencies, effect size estimates and standard errors, and p-values calculated from a study sample. 



Please direct all inquiries to:

NIH Office of Science Policy
Division of Scientific Data Sharing Policy
Telephone: 301-496-9838