NOT-OD-25-118: Request for Information on Responsibly Developing and Sharing Generative Artificial Intelligence Tools Using NIH Controlled Access Data

Request for Information on Responsibly Developing and Sharing Generative Artificial Intelligence Tools Using NIH Controlled Access Data

Notice Number:

NOT-OD-25-118

Key Dates

Release Date:

May 30, 2025

Response Date:

July 16, 2025

Related Announcements

March 28, 2025 - Protecting Human Genomic Data when Developing Generative AI Tools and Applications. See notice NOT-OD-25-081
August 27, 2014 - NIH Genomic Data Sharing Policy. See notice NOT-OD-14-124

Issued by

NATIONAL INSTITUTES OF HEALTH (NIH)

Purpose

NIH is requesting input on effective strategies for mitigating controlled-access human genomic data leakage when developing and sharing generative AI tools and applications. Importantly, these strategies should still promote and enable widespread innovation and adoption of responsible AI for biomedical research.

Background

Training AI tools involves an extensive range of data sources that may include human genomic data from NIH controlled access data repositories. As these tools become more sophisticated, they may increasingly run the risk of exposing the underlying data on which they were trained, potentially posing privacy risks to research participants. For example, generative AI tools and applications may memorize and leak underlying data, creating serious security and privacy risks when these tools are retained and shared. Consequently, researchers have developed several privacy-preserving techniques and mitigation strategies to mitigate these risks during the development, retention, and sharing of generative AI tools.

To uphold the principles espoused in the NIH Genomic Data Sharing Policy, NIH has temporarily paused the sharing of generative AI models and their parameters based on NIH human genomic controlled-access data, as well as the retention of these tools beyond the project’s closeout (see NOT-OD-25-081). NIH is seeking public input on how to protect controlled-access human genomic and associated data when developing generative AI tools and applications to ensure NIH policies keep pace with evolving generative AI technologies and their potential risks.

Request for Information

Generative AI tools hold tremendous promise for accelerating biomedical research discovery. NIH is committed to fostering responsible innovation in this domain and intends to provide guidance on how researchers can retain and share generative AI models in ways consistent with the GDS Policy’s protections for research participants, the Data Use Certification (DUC) Agreement, and other requirements. NIH seeks information from all interested individuals and communities, including, but not limited to, AI tool developers and users, data repository managers, researchers, universities and research institutions, and other members of the public. Comments are welcome on any aspect of this proposal, including the specific issues identified below:

The degree and types of risks potentially posed by data leakage from generative AI, especially in instances in which the underlying data includes large-scale, human genomic data (see examples) with access controls or other controlled-access data;
Privacy enhancing technologies that may be effective in mitigating the risk of data memorization or subsequent data leakage (e.g., via Membership Interference Attacks (MIA)) by generative AI tools across the life cycle;
Any other mitigation strategies that may protect controlled-access data from being memorized or leaked by generative AI tools across the life cycle.

How to Submit a Response

Comments should be submitted electronically to the following webpage: https://osp.od.nih.gov/comment-form-responsibly-developing-and-sharing-generative-artificial-intelligence-tools-using-nih-controlled-access-data. Responses will be accepted through July 16, 2025. Responses to this RFI are voluntary and may be submitted anonymously. You may also voluntarily include your name and contact information with your response. Other than your name and contact information, please do not include in the response any personally identifiable information or any information that you do not wish to make public. Proprietary, classified, confidential, or sensitive information should not be included in your response. After the Office of Science Policy (OSP) has finished reviewing the responses, the responses may be posted to the OSP website without redaction.

Examples of Research on Data Leakage Risk from Generative AI Models and Associated Mitigation Methodologies

Aditya, H., Chawla, S., Dhingra, G., Rai, P., Sood, S., Singh, T., Wase, Z., Bahga, A., & Madisetti, V. (2024). Evaluating privacy leakage and memorization attacks on large language models (LLMs) in generative AI applications. Journal of Software Engineering and Applications, 17, 421–447. https://doi.org/10.4236/jsea.2024.175023
Khalid, N., Qayyum, A., Bilal, M., Al-Fuqaha, A., & Qadir, J. (2023). Privacy-preserving artificial intelligence in healthcare: Techniques and applications. Computers in Biology and Medicine, 158, 106848. 10.1016/j.compbiomed.2023.106848
Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A. F., Ippolito, D., Choquette-Choo, C. A., et al. (2023). Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035. https://arxiv.org/abs/2311.17035
Neel, S., & Chang, P. (2023). Privacy issues in large language models: A survey. arXiv preprint arXiv:2312.06717. https://doi.org/10.48550/arXiv.2312.06717
Torkzadehmahani, R., Nasirigerdeh, R., Blumenthal, D. B., Kacprowski, T., List, M., Matschinske, J., et al (2022). Privacy-preserving artificial intelligence techniques in biomedicine. Methods of Information in Medicine, 61(S 01), e12–e27. https://doi.org/10.1055/s-0041-1740630

Inquiries

Please direct inquiries to the NIH Office of Science Policy

[email protected]