Notice of Joint DMS/NLM Initiative on Generalizable Data Science Methods for Biomedical Research.

Notice Number: NOT-LM-19-001

Key Dates
Release Date:October 10, 2018

Related Announcements

Issued by
National Library of Medicine (NLM)


Significant advances in technology coupled with decreasing costs associated with data collection and storage have resulted in unprecedented access to vast amounts of health- and disease-related data. Biomedical data includes genomics data from next-generation sequencing, data from different imaging modalities, real-time and static data from wearable electronics, personal mobile devices, and environmental sensors, observational health data, and clinical data from hospitals, insurance, and electronic medical records including personal health records. The National Library of Medicine (NLM) and the Division of Mathematical Sciences in the Directorate for Mathematical and Physical Sciences (DMS) at the National Science Foundation (NSF) recognize the need to support research to develop innovative and transformative mathematical and statistical approaches to address important data-driven biomedical and health challenges. The goal of this interagency program is the development of generalizable frameworks combining first principles, science-driven models of structural, spatial and temporal behaviors with innovative analytic, mathematical, computational, and statistical approaches that can portray a fuller, more nuanced picture of a person's health or the underlying processes.

This program is designed to foster inter- and multi-disciplinary collaborations. Collaborative efforts that bring together researchers from the biomedical/health and the mathematical/statistical sciences communities are a requirement for this program and must be convincingly demonstrated in the proposal. Of particular interest are new collaborative efforts involving mathematicians, statisticians, biomedical scientists, and clinicians aimed at blending first principles, science-based models with innovative data-driven and machine learning approaches to solve important biomedical problems. While the research may be motivated by a specific application or dataset, the development of methods that are generalizable and broadly applicable is preferred and encouraged.

Some of the important application areas currently supported by the National Library of Medicine include the following:

  • Finding biomarkers that support effective treatment through the integration of genetic and Electronic Health Records (EHR) data;
  • Understanding epigenetic effects on human health;
  • Extracting and analyzing information from EHR data;
  • Understanding the interactions of genotype and phenotype in humans by linking human sensor data with genomic data using dbGaP;
  • Protecting confidentiality of personal health information;
  • Mining of heterogeneous data sets (e.g. clinical and environmental).

This list is not intended to be exhaustive or exclusive. However, proposals should clearly discuss how the intended new collaborations will address a biomedical challenge and describe the use of publicly-available biomedical datasets to validate the proposed models and methodology. NIH datasets related to the research themes listed above include

Applicants are expected to list specific datasets that will be used in the proposed research and demonstrate that they have access to these datasets. The Data Management Plan should describe plans to make the data available to researchers if these data are not in the public domain.

This program is designed to promote the development of sophisticated mathematical, statistical, or computational models and methods to address biomedical data science challenges, such as

  • Modeling and integration of heterogeneous data from different sources;
  • Incorporation of synthetic data to address bias in a data set;
  • Development of methods to handle spatio-temporal dependencies and missingness;
  • Causal Inference and Machine Learning;
  • Model validation, uncertainty quantification, evaluation, reproducibility, and metrics for FAIR (findable, accessible, interoperable and reusable);
  • Natural Language Processing approaches that address combinations of structured/unstructured text.

Application submission is through the National Science Foundation via solicitation NSF-19-500 ( information concerning the application and review process is available at ( For those applications that are being considered for potential funding by NLM, the PD’s /PI’s will be required to submit the ir application’s in an NIH-approved format. PD’s /PIs invited to submit to NIH will receive further information on submission procedures. An applicant will not be allowed to increase the proposed total budget or change the scientific content of the application in the submission to the NIH . The results of the first level scientific review will be presented to NLM Board of Regents for the second level of review. NLM will make final funding determinations and issue Notices of Awards to successful applicants.

NLM and DMS anticipate making 8 to 10 awards totaling up to $4 million, in fiscal year 2019. It is expected that each award will be between $200,000 to $300,000 (total costs) per year with durations of up to 3 years.


Please direct all inquiries to:

Jane Ye, PhD
National Library of Medicine (NLM)
Telephone: 301-594-4882