Availability of Data from the Tumor Sequencing Project (TSP) Supported by the National Human Genome Research Institute, NIH

Notice Number: NOT-HG-07-003

Key Dates
Release Date: November 21, 2006

Issued by
National Human Genome Research Institute (NHGRI) ( http://www.genome.gov )

The purpose of this Notice is to alert the scientific community to the initial release of data from a pilot project to sequence the exonic regions of 1,000 genes in nearly 200 specimens of lung adenocarcinoma. Information about this project and links to the data can be obtained from: http://www.genome.gov/cancersequencing


The NHGRI-supported large-scale sequencing centers have begun to tackle the unique challenges associated with using high-throughput DNA sequencing to characterize tumor genomes (prior to the full-scale implementation of The Cancer Genome Atlas (http://cancergenome.nih.gov/index.asp) through participation in the Tumor Sequencing Project (TSP). The TSP Consortium is a collaboration among participants at the Baylor College of Medicine Human Genome Sequencing Center, the Broad Institute Genome Sequencing Platform, the Dana Farber Cancer Institute, the Memorial Sloan-Kettering Cancer Center , the Genome Sequencing Center and Siteman Cancer Center at Washington University , the M.D. Anderson Cancer Center and the University of Michigan Medical Center. The TSP is implementing approaches to large-scale identification of genomic changes in tumors including directed sequencing of the exonic regions of approximately 1,000 genes in nearly 200 specimens of adenocarcinoma of the lung, as well as using high density SNP genotyping arrays to identify, at high resolution, changes in regional chromosomal copy number. A detailed description of this project is provided in the TSP Consortium white paper: A Proposal for a Technical Demonstration Project .

As with all large-scale projects supported by the National Human Genome Research Institute, the data release policy for TSP is consistent with the Institute's goals of rapid, public release of data except to the extent that doing so could potentially pose issues of conflict with protections of the privacy of research participants. (See: Large-scale Sequencing Data Release Policy : http://www.genome.gov/10506537 and the report from The Cancer Genome Atlas Data Release Workshop: http://cancergenome.nih.gov/components/TCGA_101706.pdf ). TSP investigators have rapidly released and will continue to release as much enabling data as possible without restrictions or controls, while at the same time adhering to the regulations and practices governing human subjects research and respecting the privacy of the research participants. The specimens being used in TSP have been anonymized, with no link being maintained between the sequence data and the participant identities. To provide further safeguards to protect the privacy of research participants, the data management plan includes two levels of access for the project data: open and controlled access.

Open access data are defined as data that cannot be used to identify a participant and are released to public databases. These data include: anonymous sequence traces; summary SNP array data on copy number alterations and loss of heterozygosity; validated somatic mutations; and a list of targeted genes for the TSP.

Controlled-access data are defined as data that potentially can be used to identify a research participant and are released only to a database accessible to approved researchers. Researchers must agree to terms and conditions of data use, including reassurances to protect participant genotype information and institutional policies for human subjects research, to obtain access to the data. Controlled-access data will include: raw and processed genotype data from SNP arrays and a sequence trace identifier linking table. NHGRI is developing a Data Use Certification document and Data Access Request procedures to provide a mechanism of approval for the controlled access data. These systems will be available in early 2007.

Primary sequence data are hosted at the National Center for Biotechnology Information. The National Cancer Institute Center for Bioinformatics is hosting all other TSP data. Data linking tables are provided at http://www.genome.gov/ cancersequencing.

Bradley A. Ozenberger, Ph.D.
Division of Extramural Research
National Human Genome Research Institute
National Institutes of Health, DHHS
Suite 4076 - MSC 9305
5635 Fishers Lane
Bethesda, MD 20892-9305
(express/courier services should be directed to Rockville, MD 20852)
Telephone: (301) 496-7531
FAX: (301) 480-2770
Email: bozenberger@mail.nih.gov

