Enhancing Data Usage and Utility to Advance Biomedical Research
When beginning your next investigator-initiated application, consider the following NIH highlighted topic. The area of science described below is of interest to the listed NIH Institutes, Centers, and Offices (ICOs). This is not a notice of funding opportunity (NOFO).
Apply through an appropriate NIH Parent Funding Announcement or another broad NIH opportunity available on Grants.gov. Learn how to interpret and use Highlighted Topics.
Topic Description
Post Date: June 24, 2026
Expiration Date: June 24, 2028
Background
NIH aims to maximize the return on research and clinical care through investments in data generation, data infrastructure, cloud resources, FAIR Principles-guided data management and sharing, and data science. Additionally, there is growing need for novel statistical and computational methods to address unique designs and complex data types. Multiple NIH ICOs have independently invested in efforts that demonstrate the value of data sharing and use. However, more strategic alignment and coordination across NIH ICOs are needed to address data underutilization and empower the research community to generate data-driven insights, validate findings, and accelerate translation to improved health outcomes.
Purpose
This Highlighted Topic encourages rigorous, innovative research in utilizing publicly accessible, high-quality datasets. It aims to enhance reproducibility, enable data-driven hypothesis generation, accelerate scientific discovery, and ultimately improve disease prevention, diagnosis, treatment, and patient outcomes.
Exemplary areas of interest include:
- Developing and applying advanced computational methods (including AI/ML) to enable data discovery, reuse, and access across distributed ecosystems, standardized data extraction, interoperability, and integration across diverse biomedical, clinical, environmental, behavioral, and population-level datasets.
- Developing novel biostatistical methods to address complex study designs and analytic challenges and applying them to relevant health data.
- Enhancing rigor, reproducibility, and data quality through replication, validation, cross-cohort analyses, and development of metrics to assess completeness, bias, and utility.
- Designing, evaluating and implementing privacy-preserving strategies (e.g., federated learning, differential privacy) to enable responsible data sharing and use.
- Advancing predictive, translational, and intervention research by modeling health and disease trajectories across the lifespan, identifying and validating biomarkers or other measurable indicators, and enabling innovative, adaptive, and pragmatic study designs.
- Enabling secondary data analysis to improve population health and health systems by informing prevention, clinical care, and implementation strategies.
Participating ICOs
Additional NCI interests include but are not limited to:
- Elucidating mechanisms of tumor initiation, progression, relapse, and therapeutic resistance.
- Enabling novel adaptive clinical trial designs via digital twins, synthetic control arms, or other approaches, or AI/data-driven accrual and stratification.
- Integrating multi-omic, multi-modal data to improve predictions of risks, toxicity and response of therapies, and interventions to address population and health system characteristics.
Applicants are encouraged to discuss potential research projects (e.g., R01, R03, R21) with program staff before submission. IC may dedicate funds available to support applications in this Topic area depending upon the availability of funds, the number of meritorious applications, and competing ICO priorities.
Emily Boja
[email protected]
Jiayin (Jerry) Li
[email protected]
Danielle Daee, Ph.D.
[email protected]
Miguel R. Ossandon, Ph.D.
[email protected]
Wendy Wang, Ph.D.
[email protected]
NCCIH encourages applications leveraging data science, AI/ML, and advanced computational approaches to improve rigor, reproducibility, and translation of complementary and integrative health (CIH) research. NCCIH supports tools and methods that enable integration, secure use, and analysis of data to accelerate discovery and improve whole person health outcomes. Research areas include but are not limited to:
- Development of AI/ML, causal AI, and computational tools to harmonize, integrate, and visualize multiscale CIH data
- Development of methods to assess data quality, reproducibility, and rigor in CIH and whole person research
- Development of privacy-preserving and secure data-sharing approaches (e.g., federated learning) to enable multisite CIH analyses
- Development of AI-enabled models to predict symptom trajectories, treatment response, and patient-reported outcomes
- Data-driven approaches to improve access, recruitment, personalization, and implementation of CIH interventions
Emrin Horgusluoglu
[email protected]
NEI has supported vision research that generated rich multimodal datasets spanning imaging, visual function testing, multi-omics, and clinical phenotypes across diverse eye diseases. Prioritized areas for this HT include , but are not limited to:
- Harmonizing Optical Coherence Tomography +/- Angiography (OCT/OCTA), fundus photography, other imaging modalities, and visual fields across devices, vendors, and institutions.
- Creating open-access real-world datasets aligned with FAIR principles to enable knowledge discovery and clinical applications.
- Integrating imaging with structured clinical data and patient-reported outcomes to improve understanding of disease onset and progression.
- Developing tools and metrics to assess data quality, reliability, interoperability, and model transportability across settings.
- Integrating multi-omics and multimodal data across basic, preclinical, and clinical research to enable mechanistic insights, biomarker discovery, and therapeutic development.
James Gao, Ph.D.
[email protected]
- NHGRI is interested in secondary analysis research that leverages existing genomic and multi-omic datasets to advance understanding of gene-disease relationships, improve genomic variant interpretation, enhance diagnostic yield in rare and undiagnosed diseases, develop and validate polygenic risk models, and accelerate the translation of genomic discoveries into clinical care.
- NHGRI also encourages innovative research through secondary analyses of data accessible via the NHGRI AnVIL Cloud platform, utilizing the platform’s comprehensive analysis tools and services.
NHGRI Research Funding
[email protected]
Examples of research areas relevant to NIA may include, but not limited to:
- Secondary analyses of existing data to elucidate the etiology, disease trajectories, and risk factors influencing the development and progression of Alzheimer’s disease and related dementias (AD/ADRD), aging-related diseases, and comorbidities
- Secondary analyses of longitudinal cohorts and linked biomedical, administrative, and social data to identify drivers of variation in aging-related chronic conditions, including AD/ADRD
- Use of existing data to uncover molecular, genetic, cellular, and physiological mechanisms underlying aging and age-related changes across the lifespan in humans and other organisms
- Development of analytic methods and tools to improve the use and interpretability of large datasets on AD/ADRD, aging-related diseases, and comorbidities
- Efforts to enhance the accessibility and utilization of NIA-supported repositories to accelerate discoveries in age-related conditions and comorbidities
Rebekah Feng, Ph.D.
[email protected]
Damali Martin, Ph.D., MPH
[email protected]
Rosaly Correa-de-Araujo, MD, M.Sc., Ph.D.
[email protected]
Yi-Ping Fu, Ph.D.
[email protected]
Frank Bandiera, Ph.D., MPH
[email protected]
NIAAA’s areas of interest include, but are not limited to:
- Integrate clinical (including laboratory and imaging measures), behavioral, environmental, real-world, and omics data to study alcohol use patterns, alcohol use disorder (AUD) progression, and related health outcomes, including comorbid mental and physical health conditions, across the lifespan, to inform prevention and treatment.
- Develop and apply computational and machine learning methods to harmonize alcohol measures and standardize alcohol-related phenotypes to improve reproducibility and cross-study comparability.
- Identify and validate biomarkers and predictive models for alcohol misuse and AUD risk, treatment response, relapse, and recovery.
- Use established NIH cohorts, such as All of Us, Adolescent Brain Cognitive Development (ABCD), HEALthy Brain and Child Development (HBCD) , and Add Health to advance data-driven, individualized prevention and treatment strategies.
Wenxing Zha, Ph.D.
[email protected]
Elizabeth Powell, Ph.D.
[email protected]
Chamindi Seneviratne, M.D.
[email protected]
NIAMS seeks to maximize the scientific value of existing datasets by supporting secondary analyses and the development of innovative analytical methodologies that advance research in arthritis, musculoskeletal, and skin diseases.
Two research approaches may be considered:
- secondary analyses of existing data from databases (e.g., electronic health record, registries, population-based cohorts, surveys, imaging, labs, claims, environmental data, and multi-omics) relevant to NIAMS mission areas, including biomedical, clinical, or public health, and
development of statistical, computational, or data science methodologies that enhance existing approaches for analyzing complex health data relevant to NIAMS scientific priorities.
ICO Scientific Contact:Kamil Barbour, PhD
[email protected]
This topic invites projects that leverage and enhance existing social science, behavioral, administrative, clinical, and neuroimaging datasets through innovative secondary analyses. Proposed research should
- strengthen the rigor, reproducibility, and utility of data resources through methodological advances, such as improved data accessibility, integration, harmonization, interoperability, and reusability.
- generate knowledge to advance understanding of the causes, patterns, and health impacts of substance use, HIV, and related conditions
- inform the development, evaluation, and regulatory advancement of safe and effective therapeutics and scalable, evidence-based strategies to improve prevention, treatment quality, and health outcomes for individuals and communities affected by substance use disorders (SUDs).
Marsha Lopez
[email protected]
Jana Drgonova
[email protected]
NIDCR supports applications that maximize the value of existing datasets through secondary analyses and innovative analytic methods to advance understanding, prevention, and treatment of dental, oral, and craniofacial (DOC) conditions. This includes:
- Secondary data analyses using existing data and databases relevant to DOC, and oral-systemic health and/or practice.
- Development of statistical or computational methodologies that are poised to improve or advance extant methods for analyzing DOC, or oral-systemic health data.
Priority areas include behavioral sciences (behavior change, intervention mechanisms, nutrition, health education, adherence); clinical and population research (cohorts, trials, comparative effectiveness and safety, natural experiments, health economics, meta-analyses); and translational data science (multi-modal data integration, AI/ML/DL, federated learning frameworks, in silico validation, disease risk prediction).
William Elwood, PhD
[email protected]
Lorena Baccaglini, DDS, MS, PhD
[email protected]
Noffisat Oki, PhD
[email protected]
NIDDK has made substantial investments in large-scale clinical consortia, longitudinal cohorts, and data and sample repositories that have generated rich, multidimensional resources. Many of these datasets have been newly released, expanded, or harmonized through NIDDK-supported repositories, creating discovery opportunities beyond original scope.
Research priorities include:
- Integration, harmonization, and reanalysis of existing datasets and specimens to achieve a comprehensive understanding of patient health and identify novel mechanisms, subgroups, and trajectories;
- Generation of privacy-preserving synthetic clinical data and cohorts to support data sharing, method development, and benchmarking;
- Validation of AI/ML models; and
- Development and application of:
- AI/ML tools to predict disease progression and outcomes; and
- foundation models and methods which address analytical challenges.
Daniel Gossett, Ph.D. Kidney, Urology and Hematology
[email protected]
Xujing Wang, Ph.D. Diabetes, Endocrinology, and Metabolism
[email protected]
Veerasamy Ravichandran, Ph.D. Digestive Diseases and Nutrition
[email protected]
Rebecca Rodriguez, Ph.D. NIDDK Central Repository
[email protected]
Emily Leary, Ph.D. NIDDK Biostatistics Program
[email protected]
NIEHS is interested in:
- Analyses in existing data to identify the role of environmental exposures in disease etiology, disease mechanisms, characterization of the exposome, statistical methods development, and complex exposomics data integration
- Development of analytical pipelines and novel statistical or AI methods for complex exposomics data analysis, including open-source code and clear instruction for implementation
- Replication and validation of findings
- Leveraging large administrative datasets such as electronic health records and geospatial data to characterize environmental exposures and health
- Applying robust evidence from epidemiological, animal, and organoid studies to in silico trial designs such as digital twin studies
- Training new researchers in data reuse and secondary data analysis
Bonnie Joubert, Ph.D.
[email protected]
The NIMH supports secondary analyses of existing human mental health datasets that advance interoperability, reproducibility, and clinical actionability in mental health research and intervention. Priority areas include:
- Integration of clinical, cognitive, neuroimaging, neurophysiology, genomic behavioral, and sensor and mobile/wearable device data to identify and validate markers of mental health risk and resilience, and to establish clinically meaningful definitions of sub-populations of individuals with, or at risk, for mental illnesses and related behaviors.
- Develop privacy-preserving approaches that strengthen causal inference, predictive modeling, and external validation—moving beyond exploratory correlations when delivering reusable, open workflows and tools to the research community.
- Explicitly characterize the heterogeneity within diagnostic categories to improve transdiagnostic utility and accelerate translation to real-world mental health clinical practice.
Christina Liu, PhD PE
[email protected]
Michele Ferrante, PhD
[email protected]
The National Library of Medicine (NLM) is committed to advancing rigorous, innovative research that leverages multi-modal, data-driven approaches and the secondary analysis of publicly accessible, high-quality datasets to accelerate scientific discovery and improve health outcomes. As the world’s largest biomedical library and a leader in biomedical informatics, NLM recognizes that rigorous secondary analysis of clinical, public health, and real-world data is fundamental to advancing discovery, strengthening reproducibility, and advancing optimal health outcomes for all. Leveraging diverse data sources, while ensuring privacy, security, and ethical stewardship, enables the development and validation of innovative analytic methods, including novel statistical, computational, and AI-driven approaches.
ICO Scientific Contact:Goutham Reddy, MD MS
[email protected]
ODSS encourages investigator-initiated proposals such as addressing development and enhancement of standards, data models, and data sharing to improve scientific discovery, reproducibility, accessibility, impact, and efficacy.
Shu Hui Chen
[email protected]
NIH Office of Data Science Strategy (ODSS)
[email protected]
The Office of Research on Women’s Health (ORWH) is interested in projects that:
- Incorporate consideration of sex as a biological variable (SABV) in discovery, replication, and reproducibility study designs and analyses
- Develop female-specific common data elements (CDE), including menopause
- Harness computational models to investigate sex differences
The Office of Autoimmune Disease Research in ORWH (OADR-ORWH) is interested in:
- Developing CDE to support data extraction, harmonization, and interoperability for autoimmune disease research
- Utilizing federated data platforms to enhance pattern recognition in complex multiomic datasets, enabling insights into autoimmune disease pathogenesis, co-occurring autoimmune diseases, and shared pathogenic pathways
- Harnessing computational models to optimize clinical trial design, including use of digital twins and synthetic control arms and leveraging existing and new clinical trial and registry data to advance autoimmune disease research
Elena Gorodetsky, M.D., Ph.D.
[email protected]
Victoria Shanmugam, MBBS, MRCP, FACR, CCD
[email protected]
For technical issues E-mail OER Webmaster