May 30, 2025
NATIONAL INSTITUTES OF HEALTH (NIH)
NIH is requesting input on effective strategies for mitigating controlled-access human genomic data leakage when developing and sharing generative AI tools and applications. Importantly, these strategies should still promote and enable widespread innovation and adoption of responsible AI for biomedical research.
Background
Training AI tools involves an extensive range of data sources that may include human genomic data from NIH controlled access data repositories. As these tools become more sophisticated, they may increasingly run the risk of exposing the underlying data on which they were trained, potentially posing privacy risks to research participants. For example, generative AI tools and applications may memorize and leak underlying data, creating serious security and privacy risks when these tools are retained and shared. Consequently, researchers have developed several privacy-preserving techniques and mitigation strategies to mitigate these risks during the development, retention, and sharing of generative AI tools.
To uphold the principles espoused in the NIH Genomic Data Sharing Policy, NIH has temporarily paused the sharing of generative AI models and their parameters based on NIH human genomic controlled-access data, as well as the retention of these tools beyond the projects closeout (see NOT-OD-25-081). NIH is seeking public input on how to protect controlled-access human genomic and associated data when developing generative AI tools and applications to ensure NIH policies keep pace with evolving generative AI technologies and their potential risks.
Request for Information
Generative AI tools hold tremendous promise for accelerating biomedical research discovery. NIH is committed to fostering responsible innovation in this domain and intends to provide guidance on how researchers can retain and share generative AI models in ways consistent with the GDS Policys protections for research participants, the Data Use Certification (DUC) Agreement, and other requirements. NIH seeks information from all interested individuals and communities, including, but not limited to, AI tool developers and users, data repository managers, researchers, universities and research institutions, and other members of the public. Comments are welcome on any aspect of this proposal, including the specific issues identified below:
How to Submit a Response
Comments should be submitted electronically to the following webpage: https://osp.od.nih.gov/comment-form-responsibly-developing-and-sharing-generative-artificial-intelligence-tools-using-nih-controlled-access-data. Responses will be accepted through July 16, 2025. Responses to this RFI are voluntary and may be submitted anonymously. You may also voluntarily include your name and contact information with your response. Other than your name and contact information, please do not include in the response any personally identifiable information or any information that you do not wish to make public. Proprietary, classified, confidential, or sensitive information should not be included in your response. After the Office of Science Policy (OSP) has finished reviewing the responses, the responses may be posted to the OSP website without redaction.
Examples of Research on Data Leakage Risk from Generative AI Models and Associated Mitigation Methodologies
Please direct inquiries to the NIH Office of Science Policy