Notice of Special Interest (NOSI): Computational and Statistical Methods to Enhance Discovery from Health Data

Notice Number: NOT-LM-19-003

Key Dates
Release Date: March 19, 2019

Related Announcements

Issued by
National Library of Medicine (NLM)


The National Library of Medicine is issuing this Notice to highlight its interest in receiving grant applications through NLM Research Grants in Biomedical Informatics and Data Science (R01 Clinical Trial Optional) (PAR 18-896), focused on research to reduce or mitigate gaps and errors in health data sets.


Recent successes with the use of data-centric artificial intelligence (AI) methods such as deep learning are stimulating interest in the promise of harnessing large and complex digital health data sets to advance the goals of precision medicine. Applying AI methods to large health data sets promises to provide new powers of discovery, diagnosis, prediction, and decision support aimed at improving health outcomes and reducing healthcare costs. Numerous public datasets of human and non-human data are available, and a rich array of specialized tools and platforms can be used in studies and applications. However, recent work in identifying and addressing systematic biases and blind spots in data, and in the AI systems derived from that data, have highlighted an array of potential problems with fairness, accuracy, safety, and reproducibility of inferences and conclusions. Work on bias and incompleteness in health data sets includes studies that find poor representation of minority groups, seniors, and women. (See, for example,, or A recent Wall Street Journal article ( noted that computational tools developed by a diverse team can help avoid bias in algorithms. Beyond problems with biases and other gaps in data, research using health data from humans requires special care to protect the sources and the data (see ). The All of Us Research Program ( aims to develop an unbiased, representative health data resource, but there are many other health data sets already in use or being constructed. Tools developed using biased and incomplete data sets may contribute to erroneous analyses. Statistical fallacies and representational errors unrelated to the research question at hand could introduce systematic errors. The core questions for understanding and mitigating these and other problems in health data research are: "What can be done, computationally and/or statistically, to reduce or mitigate gaps and errors in data sets used for health research?" and, "How can we improve the tools used for discovery, understanding, and visualization in health data sets and their analyses?" Whether the problem is due to incomplete health data or inadequate tools, approaches are needed to strengthen the reproducibility and applicability of data-centered research on the etiology, epidemiology and treatment of health conditions.

Research Objectives

NLM invites research grant applications that propose state of the art methods and approaches to address problems with large health data sets or tools used to analyze them, whether the data are drawn from electronic health records or public health data sets, biomedical imaging, omics repositories or other biomedical or social/behavioral data sets. Areas of interest include but are not limited to (1) developing and testing computational or statistical approaches applied to large and/or merged health data sets holding human or non-human data, with a focus on understanding and characterizing the gaps, errors, biases, and other limitations in the data or inferences based on the data; (2) exploring approaches to correcting biases or compensating for missing data, including the introduction of debiasing techniques and policies or the use of synthetic data; (3) testing new statistical algorithms or other computational approaches to strengthen research designs for use with specific types of biomedical and social/behavioral data; (4) generating metadata that adequately characterizes the data, including its provenance, intended use, and processes by which it was collected and verified; (5) improving approaches for integrating, mining, and analyzing health data that preserve the confidentiality, accuracy, completeness and overall security of the data. Applicants should address ethical issues that might arise from their proposed approach.

Application and Submission Information

Applications in response to this Notice must be submitted through NLM’s funding opportunity announcement, PAR-18-896: NLM Research Grants in Biomedical Informatics and Data Science (R01 Clinical Trial Optional).

All instructions for PAR-18-896 must be followed.

Submissions should indicate that they are in response to NOT-LM-19-003 in Field 4.b on the SF 424 R&R form.

Program Directors/Principal Investigators (PDs/PIs) planning to submit applications on this topic are strongly encouraged to contact the scientific contact listed in this Notice for advice on the appropriateness of a potential application and alignment with NLM’s program priorities.


Please direct all inquiries to:

Alan Vanbiervliet, PhD
National Library of Medicine/Extramural Programs
Telephone: 301-594-4882