NCI Request for Information (RFI): Input on Development of an NCI Cancer Biomarker Data Aggregator

Notice Number: NOT-CA-18-095

Key Dates
Release Date: July 23, 2018
Response Date: October 20, 2018

Related Announcements

Issued by
National Cancer Institute (NCI)


This Request for Information (RFI) seeks public input and opinions on the proposed development of a Cancer Biomarker Data Aggregator (CBAG), a federated data system and a crucial clinically-oriented node of the NCI’s Cancer Research Data Commons (CRDC).


Powered by recent advancements in high throughput technologies, biomarker research is considered one of the most data-intensive areas of cancer research. The biomarker development process involves aggregation of massive amounts of complex, multi-modal data such as gene expression, cell types, imaging, and protein counts used in conjunction with clinical information and prior knowledge. To find links between potential biomarkers and disease phenotypes, a multitude of analytical approaches from data modeling to machine learning, model selection, simulation techniques, and performance analysis has been developed and employed. Nonetheless, despite considerable success in the preclinical setting, the majority of biomarker-based tests have yet to demonstrate clinical validity or clinical utility, especially in the area of risk assessment, early detection, and diagnosis of cancer, where the bar for test performance is set increasingly high. One of the fundamental challenges in biomarker discovery and validation is the lack of an efficient data infrastructure. Such infrastructure could facilitate annotation as well as the secondary use of high-value datasets for biomarker selection for new indications, validation or meta-analysis, benchmarking of selection algorithms, post-market test surveillance, and, ultimately, assembling the evidence jigsaw for clinical utility of specific biomarkers and tests in patient populations.

Recently, the NCI initiated the development of the Cancer Research Data Commons (CRDC) to facilitate access to essential datasets, computing resources, and tools. Using the experience gained from these efforts, the NCI is now proposing a pilot Cancer Biomarker Data Aggregator (CBAG) as a node of the CRDC to test different ways to accelerate biomarker research and its translation into clinically useful tests. The primary goal of the proposed node will be to provide streamlined access to relevant tools, services, and curated information on biomarkers with potential clinical utility for risk assessment, early detection, or diagnosis of cancer. In this capacity, the CBAG will complement other CRDC nodes as well as commercial databases such as BiomarkerBase and Gobiom, which are focused chiefly on biomarkers for treatment response and therapeutic target discovery.

To ensure that the pilot CBAG will support the needs of the cancer research community and advance biomarker discovery and development, we wish to engage potential CBAG users from academia, industry, government, and private organizations.

Information Requested

The NCI is interested in soliciting suggestions and opinions regarding the scope, use cases, clinical needs, and priority areas that the pilot CBAG should focus on to accelerate the development of biomarkers and tests for risk assessment and early detection of cancer. Areas of interest include but are not limited to the following:

  • Use cases such as leveraging data from longitudinal studies containing biomarkers with putative clinical utility in hereditary, familial, and sporadic cancers to be included in the pilot node
  • Examples of secondary data uses such as selection of biomarkers for new indications, model selection, development, and comparison, as well as training and education
  • Challenges related to sharing, harmonization, and curation of biomarker data and metadata
  • Challenges related to mining and integrating biomarker data and prior knowledge from the literature, free-text clinicians notes, and public knowledgebases
  • Inclusion/exclusion criteria for pilot data collections such as data and annotation quality, study design, sample size, pre-diagnostic specimens, and patient-level metadata
  • Incentives for biomarker data sharing, annotation, and standard adoption including crowdsourcing competitions, data citations or usage statistics, open-data badges for papers, and tokens for services or access to other datasets
  • Incentives for collaborative analytics and annotation using machine learning, Natural Language Processing, statistical modeling, simulation techniques, and performance analysis tools for biomarkers and tests

Submitting a Response

All responses must be submitted to by October 20, 2018.

Please include the Notice number/RFI number here in the subject line. Please be as specific as possible, provide examples or data to support your suggestions, prioritize comments, and include new ideas relevant to the question being asked. Please do not include any proprietary, classified, confidential, or sensitive information in your response. NIH will use information submitted in response to this RFI at its discretion and will not provide comments to any responder's submission. NIH may use information gathered by this RFI to inform the development of future funding opportunity announcements or in any resultant solicitations.

This RFI is for information and planning purposes only and should not be construed as a solicitation or as an obligation on the part of the Federal Government, the National Institutes of Health (NIH), or individual NIH Institutes and Centers. NIH does not intend to make any awards based on responses to this RFI or to otherwise pay for preparation of any information submitted or for the Government's use of such information. No basis for claims against the U.S. Government shall arise as a result of a response to this request for information or from the Government’s use of such information. Responses will be aggregated and may be shared publicly.


[1] NCI Cancer Research Data Commons (CRDC)

[2] NCI Genomic Data Commons (GDC)

[3] NIH Strategic Plan for Data Science

[4] NIH Genomic Data Sharing Policy (NOT-OD-14-124)

[5] NIH Notice for Use of Cloud Computing Services for Storage and Analysis of Controlled-Access Data Subject to the NIH Genomic Data Sharing Policy (NOT-OD-15-086)

[6] NCI Cancer Immunologic Data Commons (RFA-CA-17-006)

[7] Input on Development of the NCI Imaging Data Commons (NOT-CA-18-060)

[8] NCI Strategies for Matching Patients to Clinical Trials (NOT-CA-18-063)

[9] NIH Biomedical Data Translator (NOT-TR-17-023)

[10] Secondary Analysis and Integration of Existing Data to Elucidate the Genetic Architecture of Cancer Risk and Related Outcomes (PA-17-239)


Please direct all inquiries to:

For general inquiries regarding early detection and risk assessment of cancer, please contact:

Sudhir Srivastava, MPH, PhD
National Cancer Institute
Telephone: 240-276-7028

For inquiries specifically related to this RFI, contact:

Natalie Abrams, PhD
National Cancer Institute
Telephone: 240-276-5506