Critical resource gaps and opportunities to support Next Generation Sequencing (NGS) test development, validation, and data interpretation, including through the use of technologies such as artificial intelligence (AI)/machine learning (ML)
Notice Number:

Key Dates

Release Date:

August 3, 2021

Response Date:
November 1, 2021

Related Announcements

Companion Request for Information: Critical resource gaps and opportunities to support radiological tool development and clinical data interpretation using artificial intelligence (AI)/machine learning (ML) (NOT-OD-21-163)

Issued by

Office of The Director, National Institutes of Health (OD)

U.S. Food and Drug Administration (FDA)


The National Institutes of Health (NIH) and the Food and Drug Administration (FDA) are requesting information from stakeholders on what critical resource gaps exist to support Next Generation Sequencing (NGS) test validation and development (e.g., highly characterized reference materials, infrastructure) and tool development and data interpretation (e.g., AI/ML technologies). This RFI is being released in parallel with a companion RFI (NOT-OD-21-163) focused on reference material gaps for radiology. If desired, respondents may provide comments that encompass both foci where the fields converge (e.g., linking tumor features with sequencing data). The comment period on this notice is 90 days. Response to this notice is voluntary.


In NGS, demonstrating analytical validity and concordance of results obtained with different methodologies or between different testing sites is critical for ensuring reproducible science (1) and generalizability to the clinical setting. Reference materials are essential for methods development and validation (e.g., to assess reproducibility of results). However, it is widely recognized that many existing resources are insufficient; highly characterized, ethnically diverse, broadly available, and sustainable reference materials are still lacking. Moreover, there is no community consensus on the standards that need to be developed and made available to drive the field forward. This ground truth gap is frequently identified as a limiting factor that impedes high-quality research, development, validation, and regulatory science.

Information Requested

The NIH and the FDA are interested in receiving input on the greatest needs and opportunities related to the following four topics: development of reference samples, tools, and infrastructure for clinical and translational research using NGS; application of AI/ML to the interpretation of NGS data and multi-domain data; existing resources that could be leveraged to fill resource gaps; and any general comments related to critical resource gaps and opportunities to support NGS test development and validation.

The NIH and FDA are also interested in the opportunities to improve understanding, efficiency, or transparency of the processes associated with these topics. NIH and FDA welcome input from research investigators, study participants, professional organizations, and other interested members of the public. Respondents are free to address any or all of the information listed below or any relevant topic for NIH and FDA to consider. Respondents should not feel compelled to address all items.

Topic 1: Development of reference samples, tools, and infrastructure for clinical and translational research using NGS

The lack of well-characterized and widely available somatic and germline samples makes NGS test and methodology validation across laboratories difficult. To enable efficient standardization and harmonization of NGS, a diverse range (2,3) of appropriately consented and replenishable reference samples, representing diverse ethnicities, and most potential variations of interest must be made available. In addition, there is a lack of adequate infrastructure to host and support these materials and limitations on tools available to do analysis of reference materials in relation to the product of interest. NIH and FDA seek broad input on the following areas:

  • Physical reference samples representing a variety of variant types, including specific clinically relevant variants in various genomic contexts (e.g., homopolymeric regions, pseudogenes). Such samples could be made available through public or commercial resources. Potential needs include tumor/normal pairs (cell lines and clinical material); specific variant types (e.g., fusions, copy number variations (CNVs), structural variations (SVs), ploidy); edge cases in clinical diagnostics; and in-silico reference materials. Respondents interested in commenting on this topic may find it helpful to review the Medical Device Innovation Consortium (MDIC) landscape analysis report relating to NGS reference samples.
  • Tools for data analysis, interpretation, and comparative assessments. Potential needs include data distribution and analysis models (e.g., methods for predicting tertiary structure from the linear sequence); data integration methodology (e.g., integration from different platforms, different runs); and validation of data analysis and interpretation methodologies.
  • Infrastructure for storage and analysis of genomic data (and other types of data such as imaging, proteomics, electronic health records (EHR), when analyzed together) with the appropriate policies and controls in place to ensure responsible data sharing and data use (e.g., privacy protections, consent requirements, compliance with applicable laws and regulations).

Topic 2: Application of AI/ML to the interpretation of NGS data and multi-domain data

There are significant gaps in the availability of validated NGS datasets and related resources. If these gaps can be addressed, it could foster the development and validation of the next generation of AI/ML algorithms capable of analyzing NGS data (and ideally data from multiple clinical domains) to provide researchers, clinicians, physicians, and patients with new big data insights on the detection, characterization, treatment, and drug resistance of cancers and other diseases. NIH and FDA seek broad input on the following areas:

  • Robust, ethnically diverse datasets that are Findable, Accessible, Interoperable, and Reusable (FAIR) with reference standards linking disease progression and genetic data. While tremendous amounts of data are generated through clinical practice and ongoing research studies (4, 5, 6, 7), significant gaps for AI/ML use remain. Potential needs include generation/acquisition of patient outcome data; extraction of information from unstructured EHR data; generation of NGS datasets (using an appropriate protocol, best practices, multiple sites, availability and annotation); availability of datasets with comprehensive clinical and genomic data; extracting and linking genetic datasets to a common dataset for algorithm development and testing (e.g., NCI Cancer Research Data Commons Aggregator ); defining driver and resistance mutations; and intersections with proteomics and genomics (i.e., proteogenomics).
  • Methodology and analysis platforms (e.g., whole genome sequencing, total RNA sequencing, DNA methylation) to facilitate generation/curation of large, diverse datasets for research and development. There may be critical methodology and platform needs to curate, annotate, and pre-process data for learning in both health and disease; combine NGS datasets with a reference standard/truth; model disease characteristics, prognosis, progression, and treatment; identify predictive biomarkers, extract/analyze features; predict type of variant or risk; and perform real-world monitoring of adaptive/un-locked AI/ML algorithms. The algorithmic needs for the development and validation of AI/ML algorithms may go beyond the training aspect of AI/ML.

Topic 3: Existing resources that could be leveraged to fill resource gaps

NIH and FDA are also interested in receiving broad input about existing resources that could be leveraged to fill gaps identified by respondents. When identifying any relevant, existing resources, commenters may wish to include the following information in their responses where applicable:

  • Resource name and link
  • Institution or body that manages the resource
  • Disease focus, if any
  • Specimen types currently available, and number of participants
  • Primary data types currently available that could be used as "ground truth" and number of participants
  • Future releases and timeline if available
  • Supported/linked data types
  • Availability of clinical outcomes data
  • Level of characterization
  • Access model (e.g., unrestricted, controlled access)
  • Whether samples/data were collected under an informed consent that supports broad and responsible data and biospecimen sharing

Topic 4: General Comments

NIH and FDA welcome general information on any other topics with regard to critical resource gaps and opportunities to support NGS test development and validation, including the use of technologies such as AI/ML to support NGS tool development and data interpretation.


1. NIH Rigor and Reproducibility (

2. OncoSpot NGS Reference Standards (

3. Seraseq NGS Reference Standards (

4. The Cancer Genome Atlas (TCGA) (

5. Clinical Proteomic Tumor Analysis Consortium (CPTAC) (

6. Applied Proteogenomics Organizational Learning and Outcomes (APOLLO)

7. The Cancer Imaging Archive (TCIA)

Submitting a Response

Responses should be submitted electronically by November 1, 2021 using the form at You may provide responses to one or all of the topics in the comment boxes. Responses received will be posted at without change after NIH and FDA have reviewed all of the responses received. Please do not include any proprietary, classified, confidential, or sensitive information in your response.

This Request for Information (RFI) is for planning purposes only and should not be construed as a policy, solicitation for applications, or as an obligation on the part of the Government to provide support for any ideas identified in response to it. NIH and FDA may use information gathered by this RFI to inform development or modification of websites, policies and practices, processes and procedures, and supporting documentation.


Please direct all inquiries to:

NIH Office of Science Policy
Division of Clinical and Healthcare Research Policy

Weekly TOC for this Announcement
NIH Funding Opportunities and Notices