Genotype and Phenotype Data Now Available from dbGaP Database; Request Process Involves New Procedures for Principal Investigators and Signing Officials

Notice Number: NOT-LM-07-001

Key Dates
Release Date: May 29, 2007

Issued by
National Library of Medicine (NLM) ( )


Researchers may now begin requesting individual-level genotype and phenotype data from dbGaP, the database of Genotype and Phenotype. The database, which was developed and is operated by the National Library of Medicine’s National Center for Biotechnology Information (NCBI), archives and distributes data from studies that have investigated the relationship between phenotype and genotype, such as genome-wide association studies (GWAS).

dbGaP provides for two levels of access: open (available to anyone with no restrictions), and controlled (requiring preauthorization). NCBI launched the database in December 2006 with the open-access data on two studies; the open-access section allows users to view study documents, such as protocols, as well as summaries of the genotype and phenotype data. The controlled-access portion of the database is just now coming online; with authorization, it provides for downloads of individual-level genotype and phenotype data that have been de-identified (i.e., no personal identifiers, such as name, etc.).

Beginning on or around May 24, researchers may request access to the individual-level data from several studies: the Age-Related Eye Diseases Study (AREDS) on macular degeneration and cataracts, the National Institute of Neurological Disorders and Stroke Parkinsonism Study, and six studies conducted under the Genetic Association Information Network (GAIN). The data from AREDS, the Parkinsonism Study, and a GAIN study on an attention deficit hyperactivity disorder are expected to be available for download around June 9. Data from the other five GAIN studies will be rolled out over the next six months as they become available. Additionally, there will be many other studies added to dbGaP in the future, including the Framingham SHARe Study, which is associating genotype data with phenotype information collected in the landmark Framingham Heart Study.

How to Apply for Access to Controlled Access Data

In order to request access to any of the individual-level datasets within the Controlled Access portions of the database (, the Principal Investigator (PI) and the Signing Official (SO) at the investigator’s institution will need to co-sign a request for data access, which contains a data user certification (DUC), to be reviewed by an NIH Data Access Committee at the appropriate Institute or Center. In order to complete this step, which utilizes the SF 424 (R&R) form, both will need to have accounts with the NIH eRA Commons. These are the same accounts used to apply for grants, and PIs and SOs who already have such accounts do not need to do anything further to make them applicable to the dbGaP controlled-access authorization process. Information on applying for an eRA Commons account can be found at

The DUC statements outline policies and procedures for using the data, such as limiting use to the project described in the Data Access Request form; not distributing the data beyond those permitted to handle it; not attempting to identify or contact study participants from whom phenotype data and DNA were collected; awareness of the specified principles regarding intellectual property; adhering to policies regarding the timeframe for publications stemming from the data; and other provisions designed to protect the confidentiality of study participants and to foster scientific advance.

More information on dbGaP and links to the information pertaining to each available dataset can be found at


National Center for Biotechnology Information
National Library of Medicine
Building 38A
Bethesda, MD 20894
Voice: (301) 496-2475
Fax: (301) 480-9241